Site Reliability Engineer (4024)

GBG

Job Location:

Kuala Lumpur - Malaysia

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Description

Enabling safe and rewarding digital lives for genuine people everywhere

We make it our mission to ensure more genuine people have digital access to opportunities and businesses have access to more genuine people. Our technology draws on diverse and reliable data to create a single point of truth for identity and address verification.

With over 30 years of experience behind us our team and technology are focused on enabling safe and rewarding digital lives for everyone. Regardless of age location or background genuine people everywhere should be able to digitally prove who they are and where they live.

About the team and role

Global Fraud Solutions

The team provides decision support solutions to address business objectives in risk prevention and fraud detection. We deliver software solutions and offer client support using our expertise and a client-focused approach.

Site Reliability Engineer

The SRE will build and operate the reliability observability and operational excellence infrastructure underpinning the GFS managed fraud detection platforms. You will work across deployment pipelines cloud infrastructure monitoring and incident management ensuring GBG can deliver on high availability SLAs for banking and fintech customers who depend on real-time fraud detection at scale.

What you will do

Design and operate the SRE practice for Managed oferings including on-call processes SLA frameworks incident response playbooks and post-incident review (PIR) processes.
Build and maintain observability infrastructure: centralised logging (correlation IDs) metrics dashboards distributed tracing and alerting for the Predator/Instinct platform stack.
Define and track SLOs (Service Level Objectives) and error budgets for real-time transaction processing pipelines targeting high TPS and low round-trip latency.
Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform Helm) supporting both AWS/Azure cloud deployments and on-premises customer environments.
Implement and maintain CI/CD pipelines for GFS solutions (Jenkins etc.)
Work with Engineering teams to ensure security and compliance readiness for Managed services including PCI DSS ISO 27001 SOC 1/2/3 PDPA/GDPR in close coordination with InfoSec teams.
Drive platform resilience improvements: high availability auto-scaling disaster recovery backup/restore procedures and chaos engineering practices.
Manage secrets certificate rotation identity/access controls (OAuth/RBAC) and vulnerability management for the hosted environment.
Support performance testing methodology and baseline establishment for our products.
Contribute to the Architecture Review Committee (ARC) with SRE and operational perspectives on technology choices.
Collaborate with engineering squads to embed reliability and DevSecOps practices across the SDLC.

Skills were looking for

Minimum 5 years of solid hands-on experience in a Site Reliability Platform Engineering or DevOps role ideally supporting mission-critical real-time processing systems in banking payments or fintech.
Strong proficiency with cloud platforms (AWS preferred; Azure/GCP acceptable) including networking compute storage and managed services.
Deep expertise with containerisation and orchestration: Docker Kubernetes (EKS/AKS/GKE) Helm and associated tooling.
Infrastructure as Code experience: Terraform (required) and familiarity with Ansible or Pulumi.
Observability stack proficiency: Prometheus Grafana ELK/OpenSearch Jaeger/Zipkin or equivalent enterprise-grade tooling.
CI/CD pipeline design and management: GitHub Actions Jenkins ArgoCD or equivalent.
Experience with security and compliance frameworks applicable to hosted financial services: PCI DSS ISO 27001 SOC 1/2/3 GDPR/PDPA.
Familiarity with database reliability practices for SQL Server PostgreSQL and Oracle including replication read replicas and failover.
Working knowledge of secrets management (HashiCorp Vault AWS Secrets Manager) and zero-trust identity principles.
Experience supporting real-time streaming or event-driven architectures (Kafka RisingWave or similar) in production environments.
Scripting and automation proficiency: Python Bash or Go for operational tooling.
Strong sense of operational ownership and accountability comfortable being on-call and driving incidents to resolution.
Excellent communication skills able to produce clear incident reports runbooks and architecture documentation for both technical and executive audiences.
Proactive mindset: identifies reliability risks before they become incidents and champions a culture of blameless post-mortems.
Collaborative and effective working with software engineers product managers and InfoSec teams.
Continuous improvement orientation always looking to reduce toil automate repetitive tasks and improve platform resilience.
Flexible and adaptable able to support a globally distributed product with customers across multiple time zones.

To find out more

As an equal opportunity employer we are dedicated to creating a diverse and inclusive workplace where everyone feels valued and empowered. Please inform your GBG Talent Attraction Partner if you require any reasonable adjustments to the interview process.

To chat to the Talent Attraction team and find out more about our benefits and why were a great place to work drop an email to and well be in touch. You can also find out more about careers at GBG and check out our current opportunities at

Required Experience:

DescriptionEnabling safe and rewarding digital lives for genuine people everywhereWe make it our mission to ensure more genuine people have digital access to opportunities and businesses have access to more genuine people. Our technology draws on diverse and reliable data to create a single point of...

Description

Enabling safe and rewarding digital lives for genuine people everywhere

About the team and role

Global Fraud Solutions

Site Reliability Engineer

What you will do

Design and operate the SRE practice for Managed oferings including on-call processes SLA frameworks incident response playbooks and post-incident review (PIR) processes.
Build and maintain observability infrastructure: centralised logging (correlation IDs) metrics dashboards distributed tracing and alerting for the Predator/Instinct platform stack.
Define and track SLOs (Service Level Objectives) and error budgets for real-time transaction processing pipelines targeting high TPS and low round-trip latency.
Manage cloud infrastructure provisioning and configuration using IaC tooling (Terraform Helm) supporting both AWS/Azure cloud deployments and on-premises customer environments.
Implement and maintain CI/CD pipelines for GFS solutions (Jenkins etc.)
Work with Engineering teams to ensure security and compliance readiness for Managed services including PCI DSS ISO 27001 SOC 1/2/3 PDPA/GDPR in close coordination with InfoSec teams.
Drive platform resilience improvements: high availability auto-scaling disaster recovery backup/restore procedures and chaos engineering practices.
Manage secrets certificate rotation identity/access controls (OAuth/RBAC) and vulnerability management for the hosted environment.
Support performance testing methodology and baseline establishment for our products.
Contribute to the Architecture Review Committee (ARC) with SRE and operational perspectives on technology choices.
Collaborate with engineering squads to embed reliability and DevSecOps practices across the SDLC.

Skills were looking for

Minimum 5 years of solid hands-on experience in a Site Reliability Platform Engineering or DevOps role ideally supporting mission-critical real-time processing systems in banking payments or fintech.
Strong proficiency with cloud platforms (AWS preferred; Azure/GCP acceptable) including networking compute storage and managed services.
Deep expertise with containerisation and orchestration: Docker Kubernetes (EKS/AKS/GKE) Helm and associated tooling.
Infrastructure as Code experience: Terraform (required) and familiarity with Ansible or Pulumi.
Observability stack proficiency: Prometheus Grafana ELK/OpenSearch Jaeger/Zipkin or equivalent enterprise-grade tooling.
CI/CD pipeline design and management: GitHub Actions Jenkins ArgoCD or equivalent.
Experience with security and compliance frameworks applicable to hosted financial services: PCI DSS ISO 27001 SOC 1/2/3 GDPR/PDPA.
Familiarity with database reliability practices for SQL Server PostgreSQL and Oracle including replication read replicas and failover.
Working knowledge of secrets management (HashiCorp Vault AWS Secrets Manager) and zero-trust identity principles.
Experience supporting real-time streaming or event-driven architectures (Kafka RisingWave or similar) in production environments.
Scripting and automation proficiency: Python Bash or Go for operational tooling.
Strong sense of operational ownership and accountability comfortable being on-call and driving incidents to resolution.
Excellent communication skills able to produce clear incident reports runbooks and architecture documentation for both technical and executive audiences.
Proactive mindset: identifies reliability risks before they become incidents and champions a culture of blameless post-mortems.
Collaborative and effective working with software engineers product managers and InfoSec teams.
Continuous improvement orientation always looking to reduce toil automate repetitive tasks and improve platform resilience.
Flexible and adaptable able to support a globally distributed product with customers across multiple time zones.

To find out more

Required Experience:

Apply Now

About Company

GBG

We are GBG, global specialists in digital identity. We enable fast, simple and compliant customer onboarding, reducing the risk of fraud for many of the world’s leading organisations. Working with the best data, the best technology and the best people, we make it possible to balance t ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Site Reliability Engineer (4024)

Kuala Lumpur - Malaysia

Job Summary

Enabling safe and rewarding digital lives for genuine people everywhere

About the team and role

What you will do

Skills were looking for

To find out more

Enabling safe and rewarding digital lives for genuine people everywhere

About the team and role

What you will do

Skills were looking for

To find out more

About Company

Related Jobs