Role Summary
We are seeking a highly skilled and motivated Lead Site Reliability Engineer (SRE) with strong AWS expertise to lead our Service Operations team. You will be responsible for driving SRE practices ensuring the scalability reliability and performance of mission-critical systems for our digital banking clients. This role requires balancing technical depth with leadership capability setting direction mentoring engineers and ensuring service reliability at scale across multiple teams and clients.
Sign-on Bonus: Eligible for candidates who are currently employed elsewhere and able to join GFTwithin 30 daysof offer acceptance.
Key Responsibilities
Leadership & Mentorship: Lead a team of SREs providing technical guidance coaching and fostering a culture of reliability and continuous improvement.
SRE Practices: Define and mature SRE practices including SLIs/SLOs error budgets and incident response processes across production systems.
Architecture & Automation: Own the design and evolution of automated cloud operations driving adoption of Infrastructure-as-Code (Terraform CloudFormation) and CI/CD pipelines.
Incident Management: Lead major incident responses ensuring rapid resolution root cause analysis and implementation of preventive measures.
Collaboration: Work closely with Development DevOps and Cloud Engineering teams to ensure reliability and resilience are built into every stage of delivery.
Operational Excellence: Establish and track key reliability metrics (availability latency error rates) and drive initiatives to continuously improve them.
Innovation & Tooling: Evaluate and implement AWS-native and third-party tools to improve monitoring alerting and automation.
Stakeholder Engagement: Act as the primary contact point for Service Reliability topics with clients ensuring transparency and alignment on reliability goals.
Governance: Ensure compliance with industry standards and internal policies around security audit and operational risk.
Required Education & Experience
Experience: 710 years in SRE/DevOps/Cloud Engineering with at least 23 years in a lead or managerial capacity.
Cloud Expertise: Deep hands-on experience with AWS services (EC2 ECS/EKS S3 RDS IAM VPC CloudWatch).
Infrastructure as Code: Strong experience with Terraform CloudFormation and automated deployment pipelines (Harness GitLab Jenkins).
Containerization & Orchestration: Expertise in Kubernetes and container-based workloads in production.
Monitoring & Observability: Proficiency with monitoring logging and alerting tools (CloudWatch Prometheus Grafana ELK).
Incident Leadership: Proven ability to lead high-pressure incident response and post-mortem processes.
Problem-Solving & Risk Management: Strong analytical skills with the ability to anticipate assess and mitigate technical risks.
Collaboration & Communication: Excellent stakeholder management skills; fluent English required with good communication in Vietnamese for local collaboration.
Nice-to-Have Skills
Certifications such as AWS Certified DevOps Engineer Professional or AWS Solutions Architect Professional.
Experience in financial services or other highly regulated industries.
Knowledge of advanced security practices and compliance frameworks (PCI-DSS ISO 27001 SOC2).
Multi-region/multi-AZ architecture design for high availability and disaster recovery.
What We Offer You
Competitive salary and benefits package.
13th-month salary guarantee.
Performance bonus.
Professional English courses.
Premium health insurance.
Extensive annual leave and flexible working arrangements.
Opportunity to shape the SRE function and drive reliability practices for leading digital banking clients.
Due to the high volume of applications we receive we are unable to respond to every candidate individually. If you have not received a response from GFT regarding your application within 10 workdays please consider that we have decided to proceed with other candidates. We truly appreciate your interest in GFT and thank you for your understanding.
We see opportunity in technology. In domains such as cloud, AI, mainframe modernisation, DLT and IoT, we blend established practice with new thinking to help our clients stay ahead.