Required Skills:
Key Responsibilities: Design implement and maintain scalable secure and highly available infrastructure on AWS Develop and improve CI/CD pipelines Infrastructure as Code (IaC) using Terraform Harness Own and implement monitoring alerting logging and distributed tracing with tools like Dynatrace/ Datadog Troubleshoot production incidents conduct blameless postmortems and improve incident response processes Optimize systems for cost performance and reliability Drive chaos engineering and resilience testing Collaborate with development teams to embed SRE practices like SLAs SLOs and error budgets Mentor junior SREs and promote DevOps/SRE culture across the organization Basic Qualifications: Strong experience in SRE DevOps or Cloud Engineering Expertise in AWS core services (EC2 ECS/EKS Lambda S3 VPC RDS IAM CloudFront etc.) Hands-on experience with Terraform Ansible or other IaC tools Strong scripting/coding skills (Python Go Shell etc.) Experience with Kubernetes containerization and orchestration Deep knowledge of Linux systems and networking Preferred Qualifications: Experience with Service Meshes (e.g. Istio App Mesh) Familiarity with AWS Well-Architected Framework Experience building self-healing systems and automated remediation Background in security compliance or multi-account/multi-region AWS architectures Certifications (Optional/Preferred): AWS Certified DevOps Engineer Professional AWS Certified Solutions Architect Professional
IT Services and IT Consulting