SRE Lead

Alpharetta, GA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Role: SRE Lead

Location: Alpharetta GA
Experience Level: Senior / Lead

Role Overview

We are seeking an experienced Site Reliability Engineering (SRE) Lead to own and drive the reliability scalability and operational excellence of cloud-native platforms. This role combines hands-on technical depth with people leadership responsible for managing the SRE team while setting best practices across reliability engineering automation observability and incident management.

The SRE Lead will work closely with engineering security and platform teams to ensure systems are resilient secure and performant at scale.

Key Responsibilities

Leadership & Ownership

Lead and manage the SRE team owning end-to-end SRE responsibilities.
Define SRE standards reliability goals (SLIs/SLOs) and operational best practices.
Mentor engineers and drive a culture of automation resilience and continuous improvement.
Act as a key escalation point during critical incidents and outages.

Cloud & Platform Engineering

Design implement and manage cloud infrastructure using Google Cloud Platform (GCP) services:
- Compute Engine GKE VPC Cloud IAM Cloud Storage Cloud SQL.
Ensure high availability fault tolerance and scalability across environments.

Networking & Connectivity

Architect and manage:
- VPC peering Shared VPCs
- Firewall rules Load Balancers DNS
- VPN tunnels and secure hybrid connectivity

Security & Identity

Debug and manage IAM policies and service accounts.
Implement Workload Identity Federation and least-privilege access models.
Partner with security teams to enforce cloud security best practices.

Infrastructure as Code & Automation

Develop and maintain Terraform modules with strong state management and dependency handling.
Apply DRY principles across infrastructure code.
Lead infrastructure automation initiatives to reduce manual intervention.

CI/CD & Deployment Strategies

Design and maintain pipelines using:
- Jenkins (Declarative & Scripted)
- GitHub Actions (YAML workflows)
Implement advanced deployment strategies:
- Canary releases
- Blue/Green deployments
- Artifact management using Docker and Helm

Linux & Systems Engineering (Must-Have)

Deep hands-on expertise with RHEL Ubuntu and CentOS.
Kernel tuning systemd storage management (LVM).
OS-level performance optimization and troubleshooting.

Observability & Debugging

Diagnose and resolve CPU memory disk and I/O bottlenecks.
Analyze system and application logs.
Troubleshoot boot issues and low-level system failures.
Drive root cause analysis and post-incident reviews.

Programming & Scripting (Must-Have)

Strong proficiency in Python Go (Golang) or Java for automation and tooling.

Required Skills

Proven experience leading SRE or Platform Engineering teams.
Strong expertise in GCP infrastructure and Kubernetes (GKE).
Advanced Linux systems knowledge.
Infrastructure-as-Code and CI/CD mastery.
Strong debugging incident response and reliability engineering skills.

Preferred Qualifications

Certifications:
- Google Professional Cloud DevOps Engineer
- Google Cloud Architect
- CKA (Certified Kubernetes Administrator)
Experience with large-scale distributed systems and microservices.
Familiarity with:
- ITIL processes
- Change Advisory Board (CAB)
- Incident and problem management frameworks

Role: SRE Lead Location: Alpharetta GAExperience Level: Senior / Lead Role Overview We are seeking an experienced Site Reliability Engineering (SRE) Lead to own and drive the reliability scalability and operational excellence of cloud-native platforms. This role combines hands-on technical depth wit...

Role: SRE Lead

Location: Alpharetta GA
Experience Level: Senior / Lead

Role Overview

The SRE Lead will work closely with engineering security and platform teams to ensure systems are resilient secure and performant at scale.

Key Responsibilities

Leadership & Ownership

Lead and manage the SRE team owning end-to-end SRE responsibilities.
Define SRE standards reliability goals (SLIs/SLOs) and operational best practices.
Mentor engineers and drive a culture of automation resilience and continuous improvement.
Act as a key escalation point during critical incidents and outages.

Cloud & Platform Engineering

Design implement and manage cloud infrastructure using Google Cloud Platform (GCP) services:
- Compute Engine GKE VPC Cloud IAM Cloud Storage Cloud SQL.
Ensure high availability fault tolerance and scalability across environments.

Networking & Connectivity

Architect and manage:
- VPC peering Shared VPCs
- Firewall rules Load Balancers DNS
- VPN tunnels and secure hybrid connectivity

Security & Identity

Debug and manage IAM policies and service accounts.
Implement Workload Identity Federation and least-privilege access models.
Partner with security teams to enforce cloud security best practices.

Infrastructure as Code & Automation

Develop and maintain Terraform modules with strong state management and dependency handling.
Apply DRY principles across infrastructure code.
Lead infrastructure automation initiatives to reduce manual intervention.

CI/CD & Deployment Strategies

Design and maintain pipelines using:
- Jenkins (Declarative & Scripted)
- GitHub Actions (YAML workflows)
Implement advanced deployment strategies:
- Canary releases
- Blue/Green deployments
- Artifact management using Docker and Helm

Linux & Systems Engineering (Must-Have)

Deep hands-on expertise with RHEL Ubuntu and CentOS.
Kernel tuning systemd storage management (LVM).
OS-level performance optimization and troubleshooting.

Observability & Debugging

Diagnose and resolve CPU memory disk and I/O bottlenecks.
Analyze system and application logs.
Troubleshoot boot issues and low-level system failures.
Drive root cause analysis and post-incident reviews.

Programming & Scripting (Must-Have)

Strong proficiency in Python Go (Golang) or Java for automation and tooling.

Required Skills

Proven experience leading SRE or Platform Engineering teams.
Strong expertise in GCP infrastructure and Kubernetes (GKE).
Advanced Linux systems knowledge.
Infrastructure-as-Code and CI/CD mastery.
Strong debugging incident response and reliability engineering skills.

Preferred Qualifications

Certifications:
- Google Professional Cloud DevOps Engineer
- Google Cloud Architect
- CKA (Certified Kubernetes Administrator)
Experience with large-scale distributed systems and microservices.
Familiarity with:
- ITIL processes
- Change Advisory Board (CAB)
- Incident and problem management frameworks

Key Skills

Administrative Skills
Facilities Management
Biotechnology
Creative Production
Design And Estimation
Architecture

Apply Now

About Company

Purple Drive

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

SRE Lead

Alpharetta, GA - USA

Job Summary

Role: SRE Lead

Role Overview

Key Responsibilities

Leadership & Ownership

Cloud & Platform Engineering

Networking & Connectivity

Security & Identity

Infrastructure as Code & Automation

CI/CD & Deployment Strategies

Linux & Systems Engineering (Must-Have)

Observability & Debugging

Programming & Scripting (Must-Have)

Required Skills

Preferred Qualifications

Role: SRE Lead

Role Overview

Key Responsibilities

Leadership & Ownership

Cloud & Platform Engineering

Networking & Connectivity

Security & Identity

Infrastructure as Code & Automation

CI/CD & Deployment Strategies

Linux & Systems Engineering (Must-Have)

Observability & Debugging

Programming & Scripting (Must-Have)

Required Skills

Preferred Qualifications

Key Skills

About Company

Related Jobs