At Paymentology we re redefining what s possible in the payments space. As the first truly global issuer-processor we give banks and fintechs the technology and talent to launch and manage Mastercard Visa and UnionPay cards at scale - across more than 60 countries.
Our advanced multi-cloud platform delivers real-time data unmatched scalability and the flexibility of shared or dedicated processing instances. Its this global reach and innovation that sets us apart.
We re looking for a Site Reliability Engineer to ensure the high availability scalability and performance of our platform. This role is essential to maintaining reliable systems reducing operational overhead and enabling continuous improvement across our global technology landscape. If youre passionate about automation incident response and working at the intersection of infrastructure and software this is your opportunity to help build resilient systems that power financial inclusion worldwide.
What you get to do::
Platform Reliability and Scalability
- Build software that enhances Paymentology services scalability and reliability.
- Ensure platform services meet required uptime and service quality levels.
- Contribute to the design of reliable cloud infrastructure and implement reusable cloud-uptime components as code.
- Regularly review and optimise SRE practices tools and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation
- Contribute to the design implementation and maintenance of observability and monitoring solutions to track the platform health its cost-effectiveness the reliability and scalability and identify potential issues which can be fed back to product and platform engineering in a continuous improvement loop.
- Develop and implement automation scripts and tools to streamline operations and reduce manual interventions.
- Enable product teams to self-serve by participating in the development of a developer platform.
Production Issue Resolution
- Play an active role with the incident response teams diagnosing and resolving production issues quickly to minimise downtime.
Standards Compliance
- Support product teams in building services that adhere to our security and quality standards.
Cross-team Collaboration
- Work closely with engineering operations and product teams to ensure reliability is considered throughout the end-to-end software development lifecycle. We seek to achieve this through advocacy and developing a culture of reliability.**
What you can look forward to::
At Paymentology it s not just about building great payment technology it s about building a company where people feel they belong and their work matters. You ll be part of a diverse global team that s genuinely committed to making a positive impact through what we do. Whether you re working across time zones or getting involved in initiatives that support local communities you ll find real purpose in your work - and the freedom to grow in a supportive forward-thinking environment.
Travel:
< 10%
Requirements :
What it takes to succeed:
- Strong understanding of cloud networking principles.
- Proficiency with leading monitoring tools such as Datadog Splunk Prometheus Grafana ELK Stack and New Relic.
- Programming expertise especially in systems programming languages and databases
- Familiarity with one of these industry-leading CI/CD tools such as Jenkins GitHub Actions Gitlab CI CodePipelines CircleCI and ArgoCD.
- Proven in achieving platform-level and end-to-end SLIs SLOs and SLAs and fostering accountability.
- Ability to navigate complex situations and lead effective post-incident reviews (PIRs).
- Knowledge of implementing solutions to reduce Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR).
- Comprehensive understanding of large-scale distributed platform architecture.
- Expertise in implementing best practices for load balancing fault tolerance and resource allocation to maintain service quality and efficiency at scale.
- Understanding of security best practices within cloud environments.
Education and Experience:
- Bachelor s Degree in Computer Science Information Technology or related field.
- Professionals with a verifiable employment history in the role may also be considered.
- 2 years of experience as a Site Reliability Engineer.
- 2 years in software development.
- Extensive cloud experience especially with AWS.
- Proven expertise in one of the infrastructure-as-code using Terraform CloudFormation Puppet and Ansible.
- Hands-on experience with Docker ECS EKS and Kubernetes.
Remote Work :
Yes
Employment Type :
Full time