Job Title: Site Reliability Engineer (SRE)
Location: Alpharetta GA- Only Local
Job Description:
We are looking for an experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in DevOps cloud infrastructure automation monitoring and system reliability. You will be responsible for ensuring high availability scalability and performance of production systems while driving operational excellence through automation.
Key Responsibilities:
-
Design build and maintain scalable and reliable infrastructure on AWS / Azure / GCP.
-
Develop automation for deployment monitoring and incident response.
-
Implement CI/CD pipelines using tools like Jenkins GitHub Actions or GitLab CI.
-
Monitor system performance and ensure uptime latency and capacity optimization.
-
Build and maintain infrastructure as code using Terraform Ansible or CloudFormation.
-
Collaborate with development teams to improve system reliability and deployment processes.
-
Implement robust monitoring alerting and logging using Prometheus Grafana ELK or Datadog.
-
Participate in on-call rotations incident response and root cause analysis.
Required Skills:
-
10 years of experience as an SRE DevOps or Cloud Engineer.
-
Hands-on experience with AWS Azure or GCP.
-
Strong scripting skills in Python Bash or Go.
-
Proficient with Docker Kubernetes Helm.
-
Experience with Terraform Ansible or other IaC tools.
-
Expertise in monitoring & observability tools (Prometheus Grafana Splunk ELK Datadog).
-
Solid understanding of Linux system administration and networking concepts.
-
Strong troubleshooting and problem-solving skills.
Preferred Skills:
-
Experience with microservices and service mesh (Istio/Linkerd).
-
Familiarity with security best practices and incident management.
-
Experience in performance tuning and capacity planning.
-
Exposure to SLA/SLO/SLI management and reliability metrics
Education:
Job Title: Site Reliability Engineer (SRE) Location: Alpharetta GA- Only Local Job Description: We are looking for an experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in DevOps cloud infrastructure automation monitoring and sys...
Job Title: Site Reliability Engineer (SRE)
Location: Alpharetta GA- Only Local
Job Description:
We are looking for an experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in DevOps cloud infrastructure automation monitoring and system reliability. You will be responsible for ensuring high availability scalability and performance of production systems while driving operational excellence through automation.
Key Responsibilities:
-
Design build and maintain scalable and reliable infrastructure on AWS / Azure / GCP.
-
Develop automation for deployment monitoring and incident response.
-
Implement CI/CD pipelines using tools like Jenkins GitHub Actions or GitLab CI.
-
Monitor system performance and ensure uptime latency and capacity optimization.
-
Build and maintain infrastructure as code using Terraform Ansible or CloudFormation.
-
Collaborate with development teams to improve system reliability and deployment processes.
-
Implement robust monitoring alerting and logging using Prometheus Grafana ELK or Datadog.
-
Participate in on-call rotations incident response and root cause analysis.
Required Skills:
-
10 years of experience as an SRE DevOps or Cloud Engineer.
-
Hands-on experience with AWS Azure or GCP.
-
Strong scripting skills in Python Bash or Go.
-
Proficient with Docker Kubernetes Helm.
-
Experience with Terraform Ansible or other IaC tools.
-
Expertise in monitoring & observability tools (Prometheus Grafana Splunk ELK Datadog).
-
Solid understanding of Linux system administration and networking concepts.
-
Strong troubleshooting and problem-solving skills.
Preferred Skills:
-
Experience with microservices and service mesh (Istio/Linkerd).
-
Familiarity with security best practices and incident management.
-
Experience in performance tuning and capacity planning.
-
Exposure to SLA/SLO/SLI management and reliability metrics
Education:
View more
View less