Sr. Devops Engineer
Atlanta, GA - USA
Job Summary
Job Summary
We are seeking an experienced Senior DevOps Engineer / Site Reliability Engineer (SRE) to build automate and maintain scalable secure and highly available infrastructure platforms. The ideal candidate should have strong expertise in DevOps practices cloud infrastructure CI/CD automation Kubernetes monitoring and production support with a reliability-focused mindset.
Key Responsibilities
Design implement and maintain scalable CI/CD pipelines and deployment automation.
Manage cloud infrastructure and containerized environments across development staging and production.
Build and maintain highly available fault-tolerant systems and services.
Implement Infrastructure as Code (IaC) using Terraform CloudFormation or similar tools.
Administer Kubernetes clusters and container orchestration platforms.
Monitor application and infrastructure health using observability tools.
Define and manage SLIs SLOs and SLAs to improve system reliability.
Perform incident management root cause analysis (RCA) and postmortem reviews.
Automate operational tasks to improve efficiency and reduce manual intervention.
Collaborate with development QA and security teams for release and deployment activities.
Improve system performance scalability and disaster recovery processes.
Ensure security compliance backup and recovery best practices are followed.
Support production systems and participate in on-call rotations when required.
We are seeking an experienced Senior DevOps Engineer / Site Reliability Engineer (SRE) to build automate and maintain scalable secure and highly available infrastructure platforms. The ideal candidate should have strong expertise in DevOps practices cloud infrastructure CI/CD automation Kubernetes monitoring and production support with a reliability-focused mindset.
Key Responsibilities
Design implement and maintain scalable CI/CD pipelines and deployment automation.
Manage cloud infrastructure and containerized environments across development staging and production.
Build and maintain highly available fault-tolerant systems and services.
Implement Infrastructure as Code (IaC) using Terraform CloudFormation or similar tools.
Administer Kubernetes clusters and container orchestration platforms.
Monitor application and infrastructure health using observability tools.
Define and manage SLIs SLOs and SLAs to improve system reliability.
Perform incident management root cause analysis (RCA) and postmortem reviews.
Automate operational tasks to improve efficiency and reduce manual intervention.
Collaborate with development QA and security teams for release and deployment activities.
Improve system performance scalability and disaster recovery processes.
Ensure security compliance backup and recovery best practices are followed.
Support production systems and participate in on-call rotations when required.