Site Reliability Engineer – Lead

Cincinnati, OH - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Role Name: Site Reliability Engineer - Lead

Cincinnati OH - Hybrid only on w2

Role Description:

As a Site Reliability Engineer - Lead you will drive the reliability scalability and performance of mission-critical systems and services while leading a team of SREs. This role combines deep technical expertise with leadership mentoring and strategic planning. You will set standards for operational excellence guide incident response and foster a culture of automation and continuous improvement. Collaboration with engineering operations and product teams is essential to align reliability initiatives with business objectives and ensure seamless service delivery.

REQUIRED SKILL:

Proven experience in site reliability DevOps or systems engineering with prior leadership or team lead responsibilities

Strong programming/scripting skills (e.g. Python Go Bash or similar)

Deep expertise in Linux/Unix system administration and networking

Experience architecting and operating cloud platforms (AWS Azure GCP)

Proficiency with infrastructure-as-code and automation tools (e.g. Terraform Ansible CloudFormation)

Advanced knowledge of monitoring logging and alerting solutions (e.g. Prometheus Grafana ELK Datadog)

Demonstrated incident management and root cause analysis skills

Experience designing and implementing CI/CD pipelines

Strong understanding of containerization and orchestration (Docker Kubernetes)

Ability to define and enforce reliability scalability and security best practices

Excellent communication stakeholder management and collaboration skills

Experience mentoring coaching and developing SRE or engineering teams

Strong hands-on knowledge to define business process dashboards in APM tools like dynatrace with SLA ALO and SLI definition design and implementation as part of observability.

Experience with devices like Scanner POS Devices Peripheral devices (includes On device memory based devices)

Experience with Hardcoded protocols and software for devices and should be able to decode and run them and help integrate with other modules.

Experience in Edge computing Google Distributed Cloud and Hybrid cloud environments.

Experience leading SRE teams in high-growth or regulated environments

Advanced database administration and optimization skills(both SQL e.g. MYSQL and No SQL e.g. Mongo DB databases)

Key Responsibilities:

Team Leadership & Development:

Technical expertise hands on experience with ability to lead the development team.

Should be able to mentor team members and guide on the right approach for SRE related work.

Foster a culture of operational excellence automation and continuous learning

Conduct regular team meetings 1:1s and performance reviews

Reliability Strategy & Architecture:

Define and implement reliability scalability and performance strategies for critical systems

Set standards for monitoring alerting and incident response

Guide architectural decisions to ensure robust resilient infrastructure

Incident & Problem Management:

Oversee incident response root cause analysis and post-mortem processes

Coordinate with cross-functional teams to resolve complex issues and prevent recurrence

Drive improvements based on incident learnings

Process Improvement & Automation:

Identify and eliminate manual operational tasks through automation

Optimize CI/CD pipelines and deployment processes

Continuously enhance system reliability and efficiency

Stakeholder Collaboration:

Partner with engineering operations and product teams to align reliability goals with business objectives

Communicate reliability metrics risks and progress to leadership and stakeholders

Security & Compliance:

Ensure infrastructure and processes adhere to security best practices and compliance requirements

Experience in handling chaos and resilience

Role Name: Site Reliability Engineer - Lead Cincinnati OH - Hybrid only on w2 Role Description: As a Site Reliability Engineer - Lead you will drive the reliability scalability and performance of mission-critical systems and services while leading a team of SREs. This role combines deep technical ex...

Role Name: Site Reliability Engineer - Lead

Cincinnati OH - Hybrid only on w2

Role Description:

REQUIRED SKILL:

Proven experience in site reliability DevOps or systems engineering with prior leadership or team lead responsibilities

Strong programming/scripting skills (e.g. Python Go Bash or similar)

Deep expertise in Linux/Unix system administration and networking

Experience architecting and operating cloud platforms (AWS Azure GCP)

Proficiency with infrastructure-as-code and automation tools (e.g. Terraform Ansible CloudFormation)

Advanced knowledge of monitoring logging and alerting solutions (e.g. Prometheus Grafana ELK Datadog)

Demonstrated incident management and root cause analysis skills

Experience designing and implementing CI/CD pipelines

Strong understanding of containerization and orchestration (Docker Kubernetes)

Ability to define and enforce reliability scalability and security best practices

Excellent communication stakeholder management and collaboration skills

Experience mentoring coaching and developing SRE or engineering teams

Strong hands-on knowledge to define business process dashboards in APM tools like dynatrace with SLA ALO and SLI definition design and implementation as part of observability.

Experience with devices like Scanner POS Devices Peripheral devices (includes On device memory based devices)

Experience with Hardcoded protocols and software for devices and should be able to decode and run them and help integrate with other modules.

Experience in Edge computing Google Distributed Cloud and Hybrid cloud environments.

Experience leading SRE teams in high-growth or regulated environments

Advanced database administration and optimization skills(both SQL e.g. MYSQL and No SQL e.g. Mongo DB databases)

Key Responsibilities:

Team Leadership & Development:

Technical expertise hands on experience with ability to lead the development team.

Should be able to mentor team members and guide on the right approach for SRE related work.

Foster a culture of operational excellence automation and continuous learning

Conduct regular team meetings 1:1s and performance reviews

Reliability Strategy & Architecture:

Define and implement reliability scalability and performance strategies for critical systems

Set standards for monitoring alerting and incident response

Guide architectural decisions to ensure robust resilient infrastructure

Incident & Problem Management:

Oversee incident response root cause analysis and post-mortem processes

Coordinate with cross-functional teams to resolve complex issues and prevent recurrence

Drive improvements based on incident learnings

Process Improvement & Automation:

Identify and eliminate manual operational tasks through automation

Optimize CI/CD pipelines and deployment processes

Continuously enhance system reliability and efficiency

Stakeholder Collaboration:

Partner with engineering operations and product teams to align reliability goals with business objectives

Communicate reliability metrics risks and progress to leadership and stakeholders

Security & Compliance:

Ensure infrastructure and processes adhere to security best practices and compliance requirements

Experience in handling chaos and resilience

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Apply Now

About Company

Tek Leaders Inc

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Site Reliability Engineer – Lead

Cincinnati, OH - USA

Job Summary

Key Skills

About Company

Related Jobs