About :
Virtusa is a global IT services company offering digital transformation engineering and consulting helping businesses with cloud AI and legacy system modernization across industries like finance healthcare and tech. Founded in Sri Lanka in 1996 its now headquartered in Massachusetts serving major clients worldwide through its tech expertise and strategic partnerships like with Google Cloud. The company focuses on product development platform engineering and making experiences better with technology though employee reviews highlight varied experiences.
Skills
Python Site Reliability Engineer ElkAWSGCPKubernetesDockerAnsiblepackerJenkinsSplunkCriblTerraformVectorsPrometheuslinuxhelmdatadog
Job Description
We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability cloud-native infrastructure and large-scale distributed systems. This role is highly hands-on and focuses on designing building and operating reliable observable and scalable platforms running on Kubernetes with a strong preference for Google Cloud Platform (GCP) and AWS.
Roles & Responsibilities
Reliability & Operations
- Design implement and maintain highly available and resilient systems in Kubernetes-based environments
- Define and enforce SLOs SLIs and error budgets
- Lead incident response RCA and postmortems
- Drive reliability improvements through automation
Observability (Core Focus)
- Architect and operate observability platforms for metrics logging tracing and alerting
- Work with Prometheus Alertmanager OpenTelemetry Grafana Loki / ELK / OpenSearch
- Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)
- Establish actionable alerting standards
Cloud & Platform Engineering
- Build and manage infrastructure on GCP (preferred) or AWS
- Operate Kubernetes clusters (GKE preferred)
- Deploy services using Helm
- Manage containerized workloads using Docker
Automation & Tooling
- Strong Python skills with emphasis on reliability automation and observability tooling
- Develop automation and tooling using Python
- Create internal reliability and monitoring tools
- Integrate CI/CD pipelines with observability and reliability checks
Collaboration & Leadership
- Mentor junior engineers
- Influence architecture decisions
- Collaborate across engineering teams
About : Virtusa is a global IT services company offering digital transformation engineering and consulting helping businesses with cloud AI and legacy system modernization across industries like finance healthcare and tech. Founded in Sri Lanka in 1996 its now headquartered in Massachusetts serving...
About :
Virtusa is a global IT services company offering digital transformation engineering and consulting helping businesses with cloud AI and legacy system modernization across industries like finance healthcare and tech. Founded in Sri Lanka in 1996 its now headquartered in Massachusetts serving major clients worldwide through its tech expertise and strategic partnerships like with Google Cloud. The company focuses on product development platform engineering and making experiences better with technology though employee reviews highlight varied experiences.
Skills
Python Site Reliability Engineer ElkAWSGCPKubernetesDockerAnsiblepackerJenkinsSplunkCriblTerraformVectorsPrometheuslinuxhelmdatadog
Job Description
We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability cloud-native infrastructure and large-scale distributed systems. This role is highly hands-on and focuses on designing building and operating reliable observable and scalable platforms running on Kubernetes with a strong preference for Google Cloud Platform (GCP) and AWS.
Roles & Responsibilities
Reliability & Operations
- Design implement and maintain highly available and resilient systems in Kubernetes-based environments
- Define and enforce SLOs SLIs and error budgets
- Lead incident response RCA and postmortems
- Drive reliability improvements through automation
Observability (Core Focus)
- Architect and operate observability platforms for metrics logging tracing and alerting
- Work with Prometheus Alertmanager OpenTelemetry Grafana Loki / ELK / OpenSearch
- Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)
- Establish actionable alerting standards
Cloud & Platform Engineering
- Build and manage infrastructure on GCP (preferred) or AWS
- Operate Kubernetes clusters (GKE preferred)
- Deploy services using Helm
- Manage containerized workloads using Docker
Automation & Tooling
- Strong Python skills with emphasis on reliability automation and observability tooling
- Develop automation and tooling using Python
- Create internal reliability and monitoring tools
- Integrate CI/CD pipelines with observability and reliability checks
Collaboration & Leadership
- Mentor junior engineers
- Influence architecture decisions
- Collaborate across engineering teams
View more
View less