Sr. DevOpsSite Reliability Engineer

Not Interested
Bookmark
Report This Job

profile Job Location:

Chicago, IL - USA

profile Monthly Salary: Not Disclosed
profile Experience Required: 5years
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure automation observability and production support. As an SRE you will ensure our cloud-native systems are resilient scalable and efficient driving reliability through code not just processes.

Requirements

Key Responsibilities:
Design implement and maintain scalable secure and highly available infrastructure on AWS
Develop and improve CI/CD pipelines Infrastructure as Code (IaC) using Terraform Harness
Own and implement monitoring alerting logging and distributed tracing with tools like Dynatrace/ Datadog
Troubleshoot production incidents conduct blameless postmortems and improve incident response processes
Optimize systems for cost performance and reliability
Drive chaos engineering and resilience testing
Collaborate with development teams to embed SRE practices like SLAs SLOs and error budgets
Mentor junior SREs and promote DevOps/SRE culture across the organization

Basic Qualifications:
Strong experience in SRE DevOps or Cloud Engineering
Expertise in AWS core services (EC2 ECS/EKS Lambda S3 VPC RDS IAM CloudFront etc.)
Hands-on experience with Terraform Ansible or other IaC tools
Strong scripting/coding skills (Python Go Shell etc.)
Experience with Kubernetes containerization and orchestration
Deep knowledge of Linux systems and networking

Preferred Qualifications:
Experience with Service Meshes (e.g. Istio App Mesh)
Familiarity with AWS Well-Architected Framework
Experience building self-healing systems and automated remediation
Background in security compliance or multi-account/multi-region AWS architectures

Certifications (Optional/Preferred):
AWS Certified DevOps Engineer Professional
AWS Certified Solutions Architect Professional


Required Skills:

Key Responsibilities: Design implement and maintain scalable secure and highly available infrastructure on AWS Develop and improve CI/CD pipelines Infrastructure as Code (IaC) using Terraform Harness Own and implement monitoring alerting logging and distributed tracing with tools like Dynatrace/ Datadog Troubleshoot production incidents conduct blameless postmortems and improve incident response processes Optimize systems for cost performance and reliability Drive chaos engineering and resilience testing Collaborate with development teams to embed SRE practices like SLAs SLOs and error budgets Mentor junior SREs and promote DevOps/SRE culture across the organization Basic Qualifications: Strong experience in SRE DevOps or Cloud Engineering Expertise in AWS core services (EC2 ECS/EKS Lambda S3 VPC RDS IAM CloudFront etc.) Hands-on experience with Terraform Ansible or other IaC tools Strong scripting/coding skills (Python Go Shell etc.) Experience with Kubernetes containerization and orchestration Deep knowledge of Linux systems and networking Preferred Qualifications: Experience with Service Meshes (e.g. Istio App Mesh) Familiarity with AWS Well-Architected Framework Experience building self-healing systems and automated remediation Background in security compliance or multi-account/multi-region AWS architectures Certifications (Optional/Preferred): AWS Certified DevOps Engineer Professional AWS Certified Solutions Architect Professional

We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure automation observability and production support. As an SRE you will ensure our cloud-native systems are resilient scalable and efficient driving reliability through code not just processes.Requirem...
View more view more

Company Industry

IT Services and IT Consulting

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting