Site Reliability Engineer

Right Advisors

Job Location:

Gurgaon - India

Monthly Salary: L 5 - 10

Experience Required: 3-7years

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About the Role

About the Role We are seeking a proactive and detail-oriented Site Reliability Engineer (SRE) with 3 years of experience to ensure high availability reliability and performance of production systems. This role focuses on automation

incident management and cross-team coordination to drive operational excellence.

Key Responsibilities

Maintain reliable scalable and secure production environments.

Implement and manage monitoring alerting and logging solutions.

Contribute to defining and tracking SLIs/SLOs and support error budget practices.

Automate operational tasks to improve efficiency and reduce manual effort.

Perform troubleshooting and Root Cause Analysis (RCA) for production incidents.

Optimize system performance availability and capacity.

Maintain SOPs and incident documentation in Confluence.

Adhere to change management deployment governance and disaster recovery standards.

Support incident response for critical production services.

Collaboration & Tools

Coordinate with external vendors and internal cross-functional teams.

Work closely with Engineering Product Owners and Operations teams.

Manage incidents and changes using ServiceNow & JIRA.

Collaborate through Slack and structured communication channels.

Technical Skills Systems & Clouds

Strong knowledge of Windows and Linux/Unix systems

Solid understanding of networking fundamentals (DNS TCP/IP Load Balancing Firewalls).

Experience with at least one cloud platform (AWS Azure or GCP).

Automation & CI/CD

Proficiency in one scripting/programming language (Python Go Bash PowerShell or Java).

Understanding of CI/CD pipelines and automation practices.

Containers

Hands-on experience with Docker and Kubernetes

Experience with monitoring tools such as or Power BI.

Ability to analyze logs metrics and traces for troubleshooting.

ITSM & Documentation

Experience with ServiceNow & JIRA (incident/change/problem workflows)

Working knowledge of Confluence for technical documentation and knowledge management.

Additional Experience (Preferred)

Background in DevOps Cloud Engineering or Platform Engineering

Understanding of security best practices and compliance standards.

Familiarity with AI-assisted engineering tools (Claude Code Jellyfish GitHub Copilot

Exposure to large-scale or production-grade systems.

Soft Skills

Strong analytical and troubleshooting mindset

Excellent written and verbal communication skills

Ownership driven and composed during high level severity incidents
Accessibility & Inclusion Statement

We are committed to creating an inclusive environment for all employees including persons with disabilities. Reasonable accommodations will be provided upon request.

Required Skills:

sredevopstcp/ipdnslinuxawsazure

About the RoleAbout the Role We are seeking a proactive and detail-oriented Site Reliability Engineer (SRE) with 3 years of experience to ensure high availability reliability and performance of production systems. This role focuses on automation incident management and cross-team coordination to dr...