Site Reliability Engineer – Lead

Tek Leaders Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Cincinnati, OH - USA

profile Monthly Salary: Not Disclosed
Posted on: 10 hours ago
Vacancies: 1 Vacancy

Job Summary

Role Name: Site Reliability Engineer - Lead

Cincinnati OH - Hybrid only on w2

Role Description:

As a Site Reliability Engineer - Lead you will drive the reliability scalability and performance of mission-critical systems and services while leading a team of SREs. This role combines deep technical expertise with leadership mentoring and strategic planning. You will set standards for operational excellence guide incident response and foster a culture of automation and continuous improvement. Collaboration with engineering operations and product teams is essential to align reliability initiatives with business objectives and ensure seamless service delivery.

REQUIRED SKILL:

  • Proven experience in site reliability DevOps or systems engineering with prior leadership or team lead responsibilities
  • Strong programming/scripting skills (e.g. Python Go Bash or similar)
  • Deep expertise in Linux/Unix system administration and networking
  • Experience architecting and operating cloud platforms (AWS Azure GCP)
  • Proficiency with infrastructure-as-code and automation tools (e.g. Terraform Ansible CloudFormation)
  • Advanced knowledge of monitoring logging and alerting solutions (e.g. Prometheus Grafana ELK Datadog)
  • Demonstrated incident management and root cause analysis skills
  • Experience designing and implementing CI/CD pipelines
  • Strong understanding of containerization and orchestration (Docker Kubernetes)
  • Ability to define and enforce reliability scalability and security best practices
  • Excellent communication stakeholder management and collaboration skills
  • Experience mentoring coaching and developing SRE or engineering teams
  • Strong hands-on knowledge to define business process dashboards in APM tools like dynatrace with SLA ALO and SLI definition design and implementation as part of observability.
  • Experience with devices like Scanner POS Devices Peripheral devices (includes On device memory based devices)
  • Experience with Hardcoded protocols and software for devices and should be able to decode and run them and help integrate with other modules.
  • Experience in Edge computing Google Distributed Cloud and Hybrid cloud environments.
  • Experience leading SRE teams in high-growth or regulated environments
  • Advanced database administration and optimization skills(both SQL e.g. MYSQL and No SQL e.g. Mongo DB databases)

Key Responsibilities:

  • Team Leadership & Development:
  • Technical expertise hands on experience with ability to lead the development team.
  • Should be able to mentor team members and guide on the right approach for SRE related work.
  • Foster a culture of operational excellence automation and continuous learning
  • Conduct regular team meetings 1:1s and performance reviews
  • Reliability Strategy & Architecture:
  • Define and implement reliability scalability and performance strategies for critical systems
  • Set standards for monitoring alerting and incident response
  • Guide architectural decisions to ensure robust resilient infrastructure
  • Incident & Problem Management:
  • Oversee incident response root cause analysis and post-mortem processes
  • Coordinate with cross-functional teams to resolve complex issues and prevent recurrence
  • Drive improvements based on incident learnings
  • Process Improvement & Automation:
  • Identify and eliminate manual operational tasks through automation
  • Optimize CI/CD pipelines and deployment processes
  • Continuously enhance system reliability and efficiency
  • Stakeholder Collaboration:
  • Partner with engineering operations and product teams to align reliability goals with business objectives
  • Communicate reliability metrics risks and progress to leadership and stakeholders
  • Security & Compliance:
  • Ensure infrastructure and processes adhere to security best practices and compliance requirements
  • Experience in handling chaos and resilience

Role Name: Site Reliability Engineer - Lead Cincinnati OH - Hybrid only on w2 Role Description: As a Site Reliability Engineer - Lead you will drive the reliability scalability and performance of mission-critical systems and services while leading a team of SREs. This role combines deep technical ex...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting