Not Interested
Bookmark
Report This Job

profile Job Location:

Bangalore - India

profile Monthly Salary: Not Disclosed
Posted on: 2 days ago
Vacancies: 1 Vacancy

Job Summary

Mandatory Skills
Python Site Reliability Engineer Elk
Skill to Evaluate
Python Site Reliability Engineer ElkAWSGCPKubernetesDockerAnsiblepackerJenkinsSplunkCriblTerraformVectorsPrometheuslinuxhelmdatadog

Job Description
We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability cloud-native infrastructure and large-scale distributed systems. This role is highly hands-on and focuses on designing building and operating reliable observable and scalable platforms running on Kubernetes with a strong preference for Google Cloud Platform (GCP) and AWS.

Roles & Responsibilities
Reliability & Operations
  • Design implement and maintain highly available and resilient systems in Kubernetes-based environments
  • Define and enforce SLOs SLIs and error budgets
  • Lead incident response RCA and postmortems
  • Drive reliability improvements through automation

Observability (Core Focus)
  • Architect and operate observability platforms for metrics logging tracing and alerting
  • Work with Prometheus Alertmanager OpenTelemetry Grafana Loki / ELK / OpenSearch
  • Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)
  • Establish actionable alerting standards

Cloud & Platform Engineering
  • Build and manage infrastructure on GCP (preferred) or AWS
  • Operate Kubernetes clusters (GKE preferred)
  • Deploy services using Helm
  • Manage containerized workloads using Docker

Automation & Tooling
  • Strong Python skills with emphasis on reliability automation and observability tooling
  • Develop automation and tooling using Python
  • Create internal reliability and monitoring tools
  • Integrate CI/CD pipelines with observability and reliability checks

Collaboration & Leadership
  • Mentor junior engineers
  • Influence architecture decisions
  • Collaborate across engineering teams
Mandatory Skills Python Site Reliability Engineer Elk Skill to Evaluate Python Site Reliability Engineer ElkAWSGCPKubernetesDockerAnsiblepackerJenkinsSplunkCriblTerraformVectorsPrometheuslinuxhelmdatadog Job Description We are looking for a Senior Site Reliability Engineer (SRE) with deep experti...
View more view more