Reliability Engineer III

Domino's

Job Location:

Ann Arbor, MI - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Level III Site Reliability Engineers are recognized technical experts who lead complex projects and initiatives drive innovation and serve as key resources for both their team and the broader organization. They operate with significant autonomy solve complex technical problems and influence technical strategy and process improvements. Level III engineers are expected to mentor others lead cross-functional efforts and proactively identify opportunities to enhance reliability scalability and efficiency.

Key Responsibilities

Technical Leadership & Strategy: Lead the design implementation and optimization of reliability engineering solutions for mission-critical systems. Serve as a technical advisor to management and cross-functional teams recommending best practices and innovative approaches. Influence technical decisions and contribute to the development of departmental or area strategy.
Operational Excellence: Oversee incident response and root cause analysis for high-impact production issues ensuring rapid resolution and long-term prevention. Develop and refine monitoring and observability frameworks to proactively identify and address reliability and performance issues across multiple services. Drive automation initiatives creating sophisticated tools and processes to streamline operations and reduce manual intervention.
Project & Team Leadership: Lead complex projects and initiatives often spanning multiple teams or departments with notable risk and complexity. Mentor and provide guidance to junior engineers fostering a culture of continuous improvement and technical excellence. Act as a resource for colleagues sharing expertise and building consensus on difficult or sensitive topics.
Continuous Improvement & Innovation: Proactively identify and solve unique problems that have a broad impact on the business. Develop novel solutions and innovations in tools or processes to improve organizational performance. Contribute to the development of new products processes or services through applicable technology.

Competencies

Expert-level knowledge of SRE concepts operations incident response monitoring and reliability.
Demonstrated ability to solve complex technical problems and exercise judgment based on multiple sources of information.
Recognized as an internal technical expert with broad knowledge across the field of specialization.
Strong leadership skills; able to lead cross-functional projects and initiatives.
Excellent communication and influence skills; able to explain complex ideas and persuade senior stakeholders.

Qualifications :

Bachelors degree in Computer Science Information Technology or a related field; advanced degree preferred.
Minimum of 8 years supporting production applications in high-availability mission-critical environments.
Advanced proficiency in UNIX/Linux administration troubleshooting and network configuration.
Extensive hands-on experience with scripting languages (e.g. Bash Python) and automation frameworks.
Deep expertise in container orchestration (e.g. Azure Kubernetes Service) infrastructure-as-code (e.g. Terraform Puppet) and CI/CD tools (e.g. Jenkins GitHub Workflows Bitbucket).
Proven track record of leading complex projects and mentoring others.
Strong written and verbal communication skills; able to create clear concise documentation for technical and non-technical audiences.

Additional Information :

Benefits:
   Paid Holidays and Vacation
   Medical Dental & Vision benefits that start on the first day of employment
   No-cost mental health support for employee and dependents
   Childcare tuition discounts
   No-cost fitness nutrition and wellness programs
   Fertility benefits
   Adoption assistance
   401k matching contributions
   15% off the purchase price of stock
   Company bonus

Remote Work :

Employment Type :

Full-time

Key Responsibilities

Technical Leadership & Strategy: Lead the design implementation and optimization of reliability engineering solutions for mission-critical systems. Serve as a technical advisor to management and cross-functional teams recommending best practices and innovative approaches. Influence technical decisions and contribute to the development of departmental or area strategy.
Operational Excellence: Oversee incident response and root cause analysis for high-impact production issues ensuring rapid resolution and long-term prevention. Develop and refine monitoring and observability frameworks to proactively identify and address reliability and performance issues across multiple services. Drive automation initiatives creating sophisticated tools and processes to streamline operations and reduce manual intervention.
Project & Team Leadership: Lead complex projects and initiatives often spanning multiple teams or departments with notable risk and complexity. Mentor and provide guidance to junior engineers fostering a culture of continuous improvement and technical excellence. Act as a resource for colleagues sharing expertise and building consensus on difficult or sensitive topics.
Continuous Improvement & Innovation: Proactively identify and solve unique problems that have a broad impact on the business. Develop novel solutions and innovations in tools or processes to improve organizational performance. Contribute to the development of new products processes or services through applicable technology.

Competencies

Expert-level knowledge of SRE concepts operations incident response monitoring and reliability.
Demonstrated ability to solve complex technical problems and exercise judgment based on multiple sources of information.
Recognized as an internal technical expert with broad knowledge across the field of specialization.
Strong leadership skills; able to lead cross-functional projects and initiatives.
Excellent communication and influence skills; able to explain complex ideas and persuade senior stakeholders.

Qualifications :

Bachelors degree in Computer Science Information Technology or a related field; advanced degree preferred.
Minimum of 8 years supporting production applications in high-availability mission-critical environments.
Advanced proficiency in UNIX/Linux administration troubleshooting and network configuration.
Extensive hands-on experience with scripting languages (e.g. Bash Python) and automation frameworks.
Deep expertise in container orchestration (e.g. Azure Kubernetes Service) infrastructure-as-code (e.g. Terraform Puppet) and CI/CD tools (e.g. Jenkins GitHub Workflows Bitbucket).
Proven track record of leading complex projects and mentoring others.
Strong written and verbal communication skills; able to create clear concise documentation for technical and non-technical audiences.

Additional Information :

Remote Work :

Employment Type :

Full-time

Key Skills

Apply Now

About Company

Domino's

Whats behind one of the worlds top public restaurant brands? Fun and innovative franchise and corporate team members who are driven to win. Inspired to make each day better than the last, people may join for different reasons but what motivates them to stay are the passionate and ta ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click