Overview
A dynamic and skilled Manager -SRE is required to drive reliability performance and operational excellence across critical systems and services. This role involves working closely with engineering teams to build scalable infrastructure streamline processes and ensure seamless service delivery. The ideal candidate will have strong troubleshooting skills deep technical understanding and leadership capability to guide SRE practices.
Key Responsibilities:
- Lead and manage a team of Site Reliability Engineers providing guidance mentorship and support.
- Collaborate with cross-functional teams to define and implement
- strategies for improving system reliability scalability and performance.
- Monitor and analyze system performance metrics identifying areas for improvement and implementing proactive solutions.
- Troubleshoot and resolve complex technical issues ensuring minimal impact on system availability.
- Implement and maintain monitoring alerting and incident response systems.
- Develop and maintain documentation for system configurations processes and procedures.
- Stay up-to-date with industry trends and emerging technologies recommending and implementing innovative solutions.
Job requirements
- Previous experience in a similar role managing a team of Site Reliability Engineers.
- Strong knowledge of Kubernetes.
- Proficiency in scripting and automation using languages like Python Bash or PowerShell.
- Experience with monitoring and logging tools such as Prometheus Grafana or ELK stack.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and leadership abilities.
Overview A dynamic and skilled Manager -SRE is required to drive reliability performance and operational excellence across critical systems and services. This role involves working closely with engineering teams to build scalable infrastructure streamline processes and ensure seamless service delive...
Overview
A dynamic and skilled Manager -SRE is required to drive reliability performance and operational excellence across critical systems and services. This role involves working closely with engineering teams to build scalable infrastructure streamline processes and ensure seamless service delivery. The ideal candidate will have strong troubleshooting skills deep technical understanding and leadership capability to guide SRE practices.
Key Responsibilities:
- Lead and manage a team of Site Reliability Engineers providing guidance mentorship and support.
- Collaborate with cross-functional teams to define and implement
- strategies for improving system reliability scalability and performance.
- Monitor and analyze system performance metrics identifying areas for improvement and implementing proactive solutions.
- Troubleshoot and resolve complex technical issues ensuring minimal impact on system availability.
- Implement and maintain monitoring alerting and incident response systems.
- Develop and maintain documentation for system configurations processes and procedures.
- Stay up-to-date with industry trends and emerging technologies recommending and implementing innovative solutions.
Job requirements
- Previous experience in a similar role managing a team of Site Reliability Engineers.
- Strong knowledge of Kubernetes.
- Proficiency in scripting and automation using languages like Python Bash or PowerShell.
- Experience with monitoring and logging tools such as Prometheus Grafana or ELK stack.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and leadership abilities.
View more
View less