SRE & DevOps Engineer

Pune - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

We are looking for

An experienced SRE & DevOps Engineer with deep expertise in cloud infrastructure automation and observability
A hands-on engineer who ensures reliability performance and scalability of systems
A proactive problem solver with a strong focus on operational excellence and continuous improvement
A collaborator who bridges development and operations through modern DevOps and SRE practices
An effective communicator who thrives in cross-functional teams and drives best practices

This role matters to us

The Senior SRE & DevOps Engineer plays a vital role in ensuring the resilience scalability and reliability. By applying modern SRE principles automation and incident management practices you will enable faster more reliable delivery of business value while safeguarding system stability and customer trust.

Key Responsibilities

Design implement and maintain scalable secure and cloud-native infrastructure
Set up and maintain observability solutions including monitoring alerting logging and tracing (e.g. Prometheus Grafana ELK DataDog)
Continuously improve CI/CD pipelines and automate deployment workflows to increase delivery efficiency
Lead structured incident response root cause analysis and drive a culture of post-mortem learning
Collaborate closely with developers QA and architects to ensure seamless integration and performance optimization
Apply SRE principles (SLIs SLOs SLAs error budgets) to guide operational decisions and system reliability
Champion Infrastructure-as-Code practices using Terraform Helm or Ansible
Ensure security compliance and reliability are embedded into operations
Mentor team members and foster a culture of operational excellence and continuous improvement

Qualifications :

Education

Bachelors or Masters degree in Computer Science Engineering or equivalent practical experience

Work Experience

Proven 6 to 8 yrs experience in Site Reliability Engineering DevOps or Cloud Engineering roles
Hands-on expertise with Kubernetes (preferably GKE) Docker and service mesh technologies like Istio
Strong background in CI/CD practices and tools (GitHub Actions Jenkins X ArgoCD or similar)
Experience with observability solutions (Prometheus Grafana ELK Jaeger DataDog GCP Dashboards)
Proficiency with at least one major cloud platform (GCP AWS Azure)
Scripting or programming experience (Python Go Bash or similar)
Practical knowledge of Infrastructure-as-Code tools like Terraform Helm or Ansible
Hands-on experience managing incidents troubleshooting and performing root cause analysis
Familiarity with SRE practices (SLIs SLOs SLAs error budgets)

Other Requirements

Strong communication and collaboration skills across cross-functional teams
Ability to balance short-term operational needs with long-term scalability and system health
Analytical and proactive mindset with focus on continuous improvement
Fluency in English (written and spoken)

Nice-to-Have

Experience with security best practices in distributed systems (OAuth2 mTLS RBAC)
Knowledge of cost optimization and cloud governance practices
Familiarity with Camunda/CIB7 environments
Contributions to open-source DevOps/SRE communities

Remote Work :

Employment Type :

Full-time

We are looking forAn experienced SRE & DevOps Engineer with deep expertise in cloud infrastructure automation and observabilityA hands-on engineer who ensures reliability performance and scalability of systemsA proactive problem solver with a strong focus on operational excellence and continuous imp...

We are looking for

An experienced SRE & DevOps Engineer with deep expertise in cloud infrastructure automation and observability
A hands-on engineer who ensures reliability performance and scalability of systems
A proactive problem solver with a strong focus on operational excellence and continuous improvement
A collaborator who bridges development and operations through modern DevOps and SRE practices
An effective communicator who thrives in cross-functional teams and drives best practices

This role matters to us

Key Responsibilities

Design implement and maintain scalable secure and cloud-native infrastructure
Set up and maintain observability solutions including monitoring alerting logging and tracing (e.g. Prometheus Grafana ELK DataDog)
Continuously improve CI/CD pipelines and automate deployment workflows to increase delivery efficiency
Lead structured incident response root cause analysis and drive a culture of post-mortem learning
Collaborate closely with developers QA and architects to ensure seamless integration and performance optimization
Apply SRE principles (SLIs SLOs SLAs error budgets) to guide operational decisions and system reliability
Champion Infrastructure-as-Code practices using Terraform Helm or Ansible
Ensure security compliance and reliability are embedded into operations
Mentor team members and foster a culture of operational excellence and continuous improvement

Qualifications :

Education

Bachelors or Masters degree in Computer Science Engineering or equivalent practical experience

Work Experience

Proven 6 to 8 yrs experience in Site Reliability Engineering DevOps or Cloud Engineering roles
Hands-on expertise with Kubernetes (preferably GKE) Docker and service mesh technologies like Istio
Strong background in CI/CD practices and tools (GitHub Actions Jenkins X ArgoCD or similar)
Experience with observability solutions (Prometheus Grafana ELK Jaeger DataDog GCP Dashboards)
Proficiency with at least one major cloud platform (GCP AWS Azure)
Scripting or programming experience (Python Go Bash or similar)
Practical knowledge of Infrastructure-as-Code tools like Terraform Helm or Ansible
Hands-on experience managing incidents troubleshooting and performing root cause analysis
Familiarity with SRE practices (SLIs SLOs SLAs error budgets)

Other Requirements

Strong communication and collaboration skills across cross-functional teams
Ability to balance short-term operational needs with long-term scalability and system health
Analytical and proactive mindset with focus on continuous improvement
Fluency in English (written and spoken)

Nice-to-Have

Experience with security best practices in distributed systems (OAuth2 mTLS RBAC)
Knowledge of cost optimization and cloud governance practices
Familiarity with Camunda/CIB7 environments
Contributions to open-source DevOps/SRE communities

Remote Work :

Employment Type :

Full-time

Key Skills

ASP.NET
Health Education
Fashion Designing
Fiber
Investigation

Apply Now

About Company

METROMAKRO

METRO is a leading international wholesale company with food and non-food assortments that specialises in serving the needs of hotels, restaurants and caterers (HoReCa) as well as independent traders. Around the world, METRO has 15 million customers who can choose whether to shop in o ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click