SRE Engineeer

Virtusa

Job Location:

Bengaluru - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Mandatory Skills

Python Site Reliability Engineer Elk

Skill to Evaluate

Python Site Reliability Engineer

ElkAWSGCPKubernetesDockerAnsiblepackerJenkinsSplunkCriblTerraformVector sPrometheuslinuxhelmdatadog

Job Description

We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability cloud-native infrastructure and large-scale distributed systems. This role is highly hands-on and focuses on designing building and operating reliable observable and scalable platforms running on Kubernetes with a strong preference for Google Cloud Platform (GCP) and AWS.

Senior Site Reliability Engineer

Roles & Responsibilities

Reliability & Operations

- Design implement and maintain highly available and resilient systems in Kubernetes-based environments

- Define and enforce SLOs SLIs and error budgets

- Lead incident response RCA and postmortems

- Drive reliability improvements through automation

Observability (Core Focus)

- Architect and operate observability platforms for metrics logging tracing and alerting - Work with Prometheus Alertmanager OpenTelemetry Grafana Loki / ELK / OpenSearch

- Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred) - Establish actionable alerting standards

Cloud & Platform Engineering

- Build and manage infrastructure on GCP (preferred) or AWS

- Operate Kubernetes clusters (GKE preferred)

- Deploy services using Helm

- Manage containerized workloads using Docker

Automation & Tooling

- Strong Python skills with emphasis on reliability automation and observability tooling - Develop automation and tooling using Python

- Create internal reliability and monitoring tools

- Integrate CI/CD pipelines with observability and reliability checks

Collaboration & Leadership

- Mentor junior engineers

- Influence architecture decisions

- Collaborate across engineering teams

Mandatory SkillsPython Site Reliability Engineer ElkSkill to EvaluatePython Site Reliability EngineerElkAWSGCPKubernetesDockerAnsiblepackerJenkinsSplunkCriblTerraformVector sPrometheuslinuxhelmdatadogJob DescriptionWe are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in ob...