Position: Site Reliability Engineer (SRE) Kubernetes Experience: 4 Years
Education: BTech/BEComputer Science IT MS in IT MTech. MSc IT.
No of Positions: 2
Mode: On site Location: Bangalore Ahmadabad Hyderabad
ABOUT AJMERA INFOTECH
Established in 2012 our company specializes in software research and development focusing on high availability and missioncritical systems. We are proud to be the architects behind the technology powering some of the top 250 banks and wealth management organizations.
Join Ajmera Infotech: Where CuttingEdge Technology Meets Innovation
At Ajmera Infotech were on a mission to redefine the digital landscape through our technological expertise and innovative solutions. Were seeking developers with a passion for technology and a desire to tackle complex challenges. Here s what sets us apart:
Tech Expertise & Innovation:
Diverse Tech Stack: Work with a wide array of technologies including scalable web applications mobile apps and cloudnative solutions across platforms like AWS Google Cloud and Azure.
LeadingEdge Projects: Engage in projects involving AI machine learning IoT and blockchain pushing the boundaries of whats possible.
Continuous Learning: Stay at the forefront of technology with access to the latest tools resources and training.
Position Overview:
We are looking for a Senior Site Reliability Engineer / DevOps Engineer to join our team and play a key role in scaling automating and securing our largescale distributed systems. You will be responsible for maintaining high availability performance and reliability of production environments for our core product used by millions globally.
Key Responsibilities:
Design build and maintain highly available scalable and resilient infrastructure for largescale product environments.
Architect and maintain CI/CD pipelines using tools like GitLab CI Jenkins or GitHub Actions.
Implement and manage infrastructure as code (IaC) using tools like Terraform Pulumi or CloudFormation.
Manage Kubernetes clusters across cloud and onprem platforms (EKS AKS GKE).
Automate and improve system monitoring and alerting (Prometheus Grafana ELK Datadog etc..
Ensure system reliability uptime performance tuning and capacity planning.
Collaborate with developers to implement service reliability best practices (SLAs SLOs SLIs).
Champion DevSecOps culture by integrating security practices in the development pipeline.
Troubleshoot complex production issues and perform root cause analysis.
Lead incident management and postmortem processes.
- Participate in oncall rotation and production deployments.
Requirements
4 years of experience in SRE/DevOps roles with a focus on productiongrade largescale systems.
Strong handson experience with Kubernetes containerization (Docker) and orchestration.
Expertise in public cloud platforms (AWS Azure or GCP).
Proficiency in Terraform Helm and infrastructure automation.
Experience with observability tools such as Prometheus Grafana ELK Datadog Splunk etc.
Strong scripting skills (Bash Python Go etc..
Deep understanding of networking security and Linux system internals.
Familiarity with microservices architecture service meshes and API gateways.
Experience with zerodowntime deployments bluegreen/canary strategies.
Good to Have:
Certification in AWS Azure or GCP.
Experience working in highly regulated industries (e.g. finance healthcare).
Knowledge of chaos engineering fault injection tools.
Familiarity with database reliability (MySQL Postgres MongoDB etc..
Benefits
CULTURE OF RESEARCH AND DEVELOPMENT
Learning and delivering is our core culture. We are a learningcentric organization that constantly tries to be at the edge of technology. We also take pride in delivering worldclass software solutions. We make significant investments in constant learning and upskilling of our team.
Ajmera Infotech is firmly committed to being an equal opportunity employer and maintaining a diverse and inclusive environment. We value and embrace that every single one of us brings value to the table. But sometimes we forget that when we don t meet 100 of a job description s criteria maybe you re feeling that way right now. We encourage you to apply anyway. Because we want you to be you with us
Key Responsibilities: Design, build, and maintain highly available, scalable, and resilient infrastructure for large-scale product environments. Architect and maintain CI/CD pipelines using tools like GitLab CI, Jenkins, or GitHub Actions. Implement and manage infrastructure as code (IaC) using tools like Terraform, Pulumi, or CloudFormation. Manage Kubernetes clusters across cloud and on-prem platforms (EKS, AKS, GKE). Automate and improve system monitoring and alerting (Prometheus, Grafana, ELK, Datadog, etc.. Ensure system reliability, uptime, performance tuning, and capacity planning. Collaborate with developers to implement service reliability best practices (SLAs, SLOs, SLIs). Champion DevSecOps culture by integrating security practices in the development pipeline. Troubleshoot complex production issues and perform root cause analysis. Lead incident management and postmortem processes. Participate in on-call rotation and production deployments. Requirements Required Skills and Experience: 4+ years of experience in SRE/DevOps roles with a focus on production-grade large-scale systems. Strong hands-on experience with Kubernetes, containerization (Docker), and orchestration. Expertise in public cloud platforms (AWS, Azure, or GCP). Proficiency in Terraform, Helm, and infrastructure automation. Experience with observability tools such as Prometheus, Grafana, ELK, Datadog, Splunk, etc. Strong scripting skills (Bash, Python, Go, etc.. Deep understanding of networking, security, and Linux system internals. Familiarity with microservices architecture, service meshes, and API gateways. Experience with zero-downtime deployments, blue-green/canary strategies.
Education
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience).