Site Reliability Engineer (SRE) Full Time

Atlanta, GA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

The job posting is outdated and position may be filled

Job Summary

Role: Site Reliability Engineer (SRE)

Location/s: Atlanta GA / Bellevue WA / Frisco TX / Overland Park KS (Onsite from Day 1)

Job Type: Full Time

Required Skills:

Reliability Engineering Kubernetes Cloud Platform Python Scripting

The opportunity:

Design build and maintain reliable scalable and secure cloud-based infrastructure (AWS Azure or GCP).
Develop and improve observability using monitoring ing logging and tracing tools (e.g. Prometheus Grafana ELK Datadog etc.).
Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform CloudFormation Pulumi).
Create and maintain CI/CD pipelines (GitHub Actions GitLab CI Jenkins ArgoCD etc.) to support fast and safe delivery.
Lead incident response root cause analysis and postmortems to ensure high uptime and rapid recovery.
Optimize system performance reliability and cost-effectiveness through proactive monitoring and tuning.
Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.
Implement and maintain security best practices across environments (e.g. secrets management IAM firewalls etc.).
Maintain disaster recovery plans backups and high-availability strategies.

Required:

5 years of experience as an SRE DevOps Engineer or similar role.
Proficiency in scripting and automation (Bash Python Go etc.).
Strong experience with containerization and orchestration (Docker Kubernetes Helm).
Solid understanding of Linux systems administration and networking fundamentals.
Experience with cloud platforms (AWS Azure or GCP).
Experience with IaC tools like Terraform or CloudFormation.
Familiarity with GitOps and modern deployment practices.
Hands-on experience with observability tools (e.g. Prometheus Grafana Datadog).
Strong troubleshooting and incident response skills.

Preferred:

Experience in a high-traffic microservices-based architecture.
Exposure to service meshes (Istio Linkerd).
Certifications (AWS Certified DevOps Engineer CKA etc.)
Experience with security automation and compliance (e.g. SOC2 ISO27001).

Note: Visa Independent candidates are preferred

Role: Site Reliability Engineer (SRE) Location/s: Atlanta GA / Bellevue WA / Frisco TX / Overland Park KS (Onsite from Day 1) Job Type: Full Time Required Skills: Reliability Engineering Kubernetes Cloud Platform Python Scripting The opportunity: Design build and maintain reliable scala...