Level III Site Reliability Engineers are recognized technical experts who lead complex projects and initiatives drive innovation and serve as key resources for both their team and the broader organization. They operate with significant autonomy solve complex technical problems and influence technical strategy and process improvements. Level III engineers are expected to mentor others lead cross-functional efforts and proactively identify opportunities to enhance reliability scalability and efficiency.
Key Responsibilities
- Technical Leadership & Strategy: Lead the design implementation and optimization of reliability engineering solutions for mission-critical systems. Serve as a technical advisor to management and cross-functional teams recommending best practices and innovative approaches. Influence technical decisions and contribute to the development of departmental or area strategy.
- Operational Excellence: Oversee incident response and root cause analysis for high-impact production issues ensuring rapid resolution and long-term prevention. Develop and refine monitoring and observability frameworks to proactively identify and address reliability and performance issues across multiple services. Drive automation initiatives creating sophisticated tools and processes to streamline operations and reduce manual intervention.
- Project & Team Leadership: Lead complex projects and initiatives often spanning multiple teams or departments with notable risk and complexity. Mentor and provide guidance to junior engineers fostering a culture of continuous improvement and technical excellence. Act as a resource for colleagues sharing expertise and building consensus on difficult or sensitive topics.
- Continuous Improvement & Innovation: Proactively identify and solve unique problems that have a broad impact on the business. Develop novel solutions and innovations in tools or processes to improve organizational performance. Contribute to the development of new products processes or services through applicable technology.
Competencies
- Expert-level knowledge of SRE concepts operations incident response monitoring and reliability.
- Demonstrated ability to solve complex technical problems and exercise judgment based on multiple sources of information.
- Recognized as an internal technical expert with broad knowledge across the field of specialization.
- Strong leadership skills; able to lead cross-functional projects and initiatives.
- Excellent communication and influence skills; able to explain complex ideas and persuade senior stakeholders.
Qualifications :
- Bachelors degree in Computer Science Information Technology or a related field; advanced degree preferred.
- Minimum of 8 years supporting production applications in high-availability mission-critical environments.
- Advanced proficiency in UNIX/Linux administration troubleshooting and network configuration.
- Extensive hands-on experience with scripting languages (e.g. Bash Python) and automation frameworks.
- Deep expertise in container orchestration (e.g. Azure Kubernetes Service) infrastructure-as-code (e.g. Terraform Puppet) and CI/CD tools (e.g. Jenkins GitHub Workflows Bitbucket).
- Proven track record of leading complex projects and mentoring others.
- Strong written and verbal communication skills; able to create clear concise documentation for technical and non-technical audiences.
Additional Information :
Benefits:
Paid Holidays and Vacation
Medical Dental & Vision benefits that start on the first day of employment
No-cost mental health support for employee and dependents
Childcare tuition discounts
No-cost fitness nutrition and wellness programs
Fertility benefits
Adoption assistance
401k matching contributions
15% off the purchase price of stock
Company bonus
Remote Work :
No
Employment Type :
Full-time
Level III Site Reliability Engineers are recognized technical experts who lead complex projects and initiatives drive innovation and serve as key resources for both their team and the broader organization. They operate with significant autonomy solve complex technical problems and influence technica...
Level III Site Reliability Engineers are recognized technical experts who lead complex projects and initiatives drive innovation and serve as key resources for both their team and the broader organization. They operate with significant autonomy solve complex technical problems and influence technical strategy and process improvements. Level III engineers are expected to mentor others lead cross-functional efforts and proactively identify opportunities to enhance reliability scalability and efficiency.
Key Responsibilities
- Technical Leadership & Strategy: Lead the design implementation and optimization of reliability engineering solutions for mission-critical systems. Serve as a technical advisor to management and cross-functional teams recommending best practices and innovative approaches. Influence technical decisions and contribute to the development of departmental or area strategy.
- Operational Excellence: Oversee incident response and root cause analysis for high-impact production issues ensuring rapid resolution and long-term prevention. Develop and refine monitoring and observability frameworks to proactively identify and address reliability and performance issues across multiple services. Drive automation initiatives creating sophisticated tools and processes to streamline operations and reduce manual intervention.
- Project & Team Leadership: Lead complex projects and initiatives often spanning multiple teams or departments with notable risk and complexity. Mentor and provide guidance to junior engineers fostering a culture of continuous improvement and technical excellence. Act as a resource for colleagues sharing expertise and building consensus on difficult or sensitive topics.
- Continuous Improvement & Innovation: Proactively identify and solve unique problems that have a broad impact on the business. Develop novel solutions and innovations in tools or processes to improve organizational performance. Contribute to the development of new products processes or services through applicable technology.
Competencies
- Expert-level knowledge of SRE concepts operations incident response monitoring and reliability.
- Demonstrated ability to solve complex technical problems and exercise judgment based on multiple sources of information.
- Recognized as an internal technical expert with broad knowledge across the field of specialization.
- Strong leadership skills; able to lead cross-functional projects and initiatives.
- Excellent communication and influence skills; able to explain complex ideas and persuade senior stakeholders.
Qualifications :
- Bachelors degree in Computer Science Information Technology or a related field; advanced degree preferred.
- Minimum of 8 years supporting production applications in high-availability mission-critical environments.
- Advanced proficiency in UNIX/Linux administration troubleshooting and network configuration.
- Extensive hands-on experience with scripting languages (e.g. Bash Python) and automation frameworks.
- Deep expertise in container orchestration (e.g. Azure Kubernetes Service) infrastructure-as-code (e.g. Terraform Puppet) and CI/CD tools (e.g. Jenkins GitHub Workflows Bitbucket).
- Proven track record of leading complex projects and mentoring others.
- Strong written and verbal communication skills; able to create clear concise documentation for technical and non-technical audiences.
Additional Information :
Benefits:
Paid Holidays and Vacation
Medical Dental & Vision benefits that start on the first day of employment
No-cost mental health support for employee and dependents
Childcare tuition discounts
No-cost fitness nutrition and wellness programs
Fertility benefits
Adoption assistance
401k matching contributions
15% off the purchase price of stock
Company bonus
Remote Work :
No
Employment Type :
Full-time
View more
View less