- Take ownership of maintaining the stability and reliability of mission-critical deployments.
- Guide the team in delivering timely fixes and workarounds for application and system service issues adhering to strict SLAs.
- Direct the team in diagnosing and resolving issues across application and system components.
- Oversee the preparation and review of RCA documentation to ensure thorough post-incident analysis.
- Actively participate in war room situations during critical incidents providing leadership and direction.
- Engage with clients during incidents system updates and upgrade activities ensuring clear communication and alignment.
- Serve as a technical role model and mentor setting high standards for quality and professionalism.
- Provide both technical and managerial leadership to drive team performance and collaboration.
Requirements
- Bachelor s degree in Computer Science Software Engineering or a related discipline or equivalent . qualification.
- Minimum of 12 years of overall experience with at least 8 years focused on application support.
- Extensive background in managing and supporting distributed systems.
- Proficient in troubleshooting tools and techniques for Java and C based services.
- Skilled in identifying and resolving performance issues across backend systems databases message brokers and load balancers.
- Hands-on experience with DevOps tools such as Jenkins Terraform and Helm charts.
- Familiar with system monitoring solutions particularly Prometheus.
- Deep knowledge of databases Kubernetes various load balancers and Google Cloud Platform (GCP).
- Strong background in Linux/Unix system administration and solid understanding of operating system fundamentals.
Extensive background in managing and supporting distributed systems. Proficient in troubleshooting tools and techniques for Java and C++ based services. Skilled in identifying and resolving performance issues across backend systems, databases, message brokers, and load balancers. Hands-on experience with DevOps tools such as Jenkins, Terraform, and Helm charts. Familiar with system monitoring solutions, particularly Prometheus. Deep knowledge of databases, Kubernetes, various load balancers, and Google Cloud Platform (GCP). Strong background in Linux/Unix system administration and solid understanding of operating system fundamentals.