Senior Site Reliability Engineer


Job Location:

Deerfield, WI - USA

Monthly Salary: Not Disclosed
Posted on: 16 days ago
Vacancies: 1 Vacancy

Job Summary

I hope you are doing good today Please review the JD below and share your interest accordingly :-

Job Details:-

Role:- Site Reliability Engineer

Location:- Deerfield IL Onsite

Experience Required:- 6 Years

Job Type:- Full-Time with TCS

Salary Range:- $120000 to $140000 Per Year

Job Description

Must Have Technical/Functional Skills

  • 7 years of experience in SRE platform engineering or cloud infrastructure engineering in large-scale enterprise environments (10000 employees or equivalent complexity).
  • Deep hands-on expertise with Microsoft Azure - minimum 4 years in a primary Azure cloud engineering role.
  • Expert-level proficiency with AKS: cluster lifecycle management RBAC network policies pod security standards cluster autoscaler and Workload Identity.
  • Strong infrastructure-as-code skills: Terraform (required) and/or Bicep; experience managing Azure Landing Zones or Enterprise-Scale architecture.
  • Proficiency in at least one systems programming/scripting language: Python (preferred) Go or PowerShell.
  • Experience designing and operating enterprise observability platforms using Azure Monitor Log Analytics and Application Insights at scale.
  • Demonstrable track record of owning SLOs/SLIs and delivering measurable reliability improvements in production.
  • Strong knowledge of enterprise networking in Azure: Hub-and-Spoke/Virtual WAN ExpressRoute Azure Firewall NSGs Private Endpoints and DNS Private Zones.

Required/Preferred Certifications:

  • AZ-104 AZ-305 (Preferred) AZ-400 (Preferred) CKA ITIL v4 Foundation

Roles & Responsibilities
Reliability & Availability Engineering

  • Define own and enforce enterprise-wide SLOs SLIs and Error Budgets across all Tier-0 and Tier-1 Azure-hosted services; report SLA compliance to executive stakeholders monthly.
  • Lead architectural reviews for new services and ensure reliability non-functionals (availability targets RTO/RPO) are embedded from design through to production.
  • Champion and implement chaos engineering practices using Azure Chaos Studio and custom fault injection frameworks to proactively surface reliability risks.
  • Drive Disaster Recovery (DR) design and conduct quarterly DR drills across Azure paired regions. Incident Management & On-Call
  • Serve as Incident Commander for P1/P2 major incidents own end-to-end incident lifecycle from detection through resolution and Post-Incident Review (PIR).
  • Participate in a structured On-Call rotation with follow-the-sun global coverage; maintain response SLAs of < 5 minutes for Tier-0 services.
  • Drive blameless post-mortem culture and ensure all action items from PIRs are tracked and delivered within agreed SLA.

Observability & Platform Engineering

  • Design and operate the enterprise observability stack: Azure Monitor Log Analytics Workspaces Application Insights and Azure Managed Grafana; ensure full MELT (Metrics Events Logs Traces) coverage.
  • Build and maintain alerting frameworks using Azure Monitor Alert Rules and Azure Action Groups integrated with PagerDuty and ServiceNow.
  • Develop and operate platform automation runbooks and self-healing capabilities using Azure Automation Logic Apps and Python/PowerShell scripting.

CI/CD & Infrastructure Reliability

  • Collaborate with DevOps and development teams to embed reliability gates into Azure DevOps pipelines ; automated performance testing synthetic monitoring and progressive deployment (canary/blue-green) strategies.
  • Manage reliability of AKS clusters across multiple Azure regions own node pool scaling upgrade strategy and cluster hardening in alignment with CIS Benchmarks.
  • Contribute to infrastructure-as-code reliability reviews using Terraform/Bicep to enforce standards across Azure Landing Zones.

Kind Regards

Yogesh Kumar

Sr. IT Recruiter
Work#:

Mailto:

I hope you are doing good today Please review the JD below and share your interest accordingly :- Job Details:- Role:- Site Reliability Engineer Location:- Deerfield IL Onsite Experience Required:- 6 Years Job Type:- Full-Time with TCS Salary Range:- $120000 to $140000 Per Ye...