Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position: Site Reliability Engineer (SRE)
Location: Atlanta GA 30346
Duration: 9 Months
Job Type: Temporary Assignment
Work Type: Hybrid
Job Description:
- SREs with experience in Kubernetes and cloud platforms. Strong communication skills are essential since this team will need to interact with multiple application teams and at times even VPs during critical issues.
- High expectations are set for these resources so we need top-quality candidates with both technical skills and excellent communication abilities
- We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build scale and maintain our infrastructure and services.
- You will play a critical role in ensuring high availability performance scalability and security of our production systems while enabling continuous deployment and rapid delivery of features to our customers.
Key Responsibilities:
- Design build and maintain reliable scalable and secure cloud-based infrastructure (AWS Azure or GCP).
- Develop and improve observability using monitoring alerting logging and tracing tools (e.g. Prometheus Grafana ELK Datadog etc.).
- Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform CloudFormation Pulumi).
- Create and maintain CI/CD pipelines (GitHub Actions GitLab CI Jenkins ArgoCD etc.) to support fast and safe delivery.
- Lead incident response root cause analysis and postmortems to ensure high uptime and rapid recovery.
- Optimize system performance reliability and cost-effectiveness through proactive monitoring and tuning.
- Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.
- Implement and maintain security best practices across environments (e.g. secrets management IAM firewalls etc.).
- Maintain disaster recovery plans backups and high-availability strategies.
Qualifications:
Required:
- 5 years of experience as an SRE DevOps Engineer or similar role.
- Proficiency in scripting and automation (Bash Python Go etc.).
- Strong experience with containerization and orchestration (Docker Kubernetes Helm).
- Solid understanding of Linux systems administration and networking fundamentals.
- Experience with cloud platforms (AWS Azure or GCP).
- Experience with IaC tools like Terraform or CloudFormation.
- Familiarity with GitOps and modern deployment practices.
- Hands-on experience with observability tools (e.g. Prometheus Grafana Datadog).
- Strong troubleshooting and incident response skills.
Preferred:
- Experience in a high-traffic microservices-based architecture.
- Exposure to service meshes (Istio Linkerd).
- Certifications (AWS Certified DevOps Engineer CKA etc.)
- Experience with security automation and compliance (e.g. SOC2 ISO27001).
Soft Skills:
- Strong communication and collaboration abilities.
- Ability to thrive in a fast-paced agile environment.
- Analytical mindset and proactive approach to problem-solving.
- A passion for automation performance and system design
Requirements:
- Skill Rating Experience
- Kubernetes
- SRE
- Cloud-based infrastructure (AWS Azure or GCP).
- Excellent communication
- Prometheus Grafana ELK Datadog etc
- Infrastructure-as-Code (Terraform CloudFormation Pulumi
- Scripting and automation (Bash Python Go etc.).
TekWissen Group is an equal opportunity employer supporting workforce diversity.