Site Reliability Engineer-Devops with Networking(DNS, TCPIP)
Job Summary
- Title : Site Reliability Engineer
- Experience : 3 years
- Location : Gurgaon
- Employment Type : Full Time
- Notice Period : Immediate Joiner
-
Job Description
About the Role
We are seeking a proactive and detail-oriented Site Reliability Engineer (SRE) with 3 years of experience to ensure high availability reliability and performance of production systems.
This role focuses on automation observability incident management and cross-team coordination to drive operational excellence.
Key Responsibilities
Maintain reliable scalable and secure production environments.
Implement and manage monitoring alerting and logging solutions.
Contribute to defining and tracking SLIs/SLOs and support error budget practices.
Automate operational tasks to improve efficiency and reduce manual effort.
Perform troubleshooting and Root Cause Analysis (RCA) for production incidents.
Optimize system performance availability and capacity.
Maintain run books SOPs and incident documentation in Confluence.
Adhere to change management deployment governance and disaster recovery standards.
Support incident response for critical production services.
Collaboration & Tools
Coordinate with external vendors and internal cross-functional teams.
Work closely with Engineering Product Owners and Operations teams.
Manage incidents and changes using ServiceNow & JIRA.
Collaborate through Slack and structured communication channels.
Technical Skills
Systems & Clouds
Strong knowledge of Windows and Linux/Unix systems.
Solid understanding of networking fundamentals (DNS TCP/IP Load Balancing Firewalls).
Experience with at least one cloud platform (AWS Azure or GCP).
Automation & CI/CD
Proficiency in one scripting/programming language (Python Go Bash PowerShell or Java).
Understanding of CI/CD pipelines and automation practices.
Containers & Observability
Hands-on experience with Docker and Kubernetes.
Experience with monitoring tools such as Grafana or Power BI.
Ability to analyze logs metrics and traces for troubleshooting.
ITSM & Documentation
Experience with ServiceNow & JIRA (incident/change/problem workflows).
Working knowledge of Confluence for technical documentation and knowledge management.
Additional Experience (Preferred)
Background in DevOps Cloud Engineering or Platform Engineering
Understanding of security best practices and compliance standards.
Familiarity with AI-assisted engineering tools (Claude Code Jellyfish GitHub Copilot).
Exposure to large-scale or production-grade systems.
Soft Skills
Strong analytical and troubleshooting mindset
Excellent written and verbal communication skills
Effective stakeholder and vendor coordination
Ownership driven and composed during high level severity incidents
Required Skills:
Site Reliability Devops DNS/TCP/IP NetworkingLinux. Service NowDockerJiracicd aws/azure/gcp.
Required Education:
Any Science Background
About Company
We are one of the fastest growing HR services organization. We create long-term sustainable partnerships with our clients by providing resource solutions to meet their business needs. Expertise and leadership propelled with a successful service model, which is intrinsic to our clients ... View more