Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailo Analyze incident patterns and trends to gain insights into recurring issues collaborating with product teams to drive their resolution.
o Proactively manage alerts identify potential problems and work with cross-functional teams to enhance reliability and performance.
o Collaborate with product teams to prioritize operational user stories focused on reliability and performance improvements.
o Document operational workflows and troubleshooting guides to support knowledge sharing and team efficiency.
o Lead efforts to troubleshoot complex issues in collaboration with L3 and L4 support partners ensuring swift resolution and minimal downtime.
o Participate in crisis management and response including on-call rotations to address critical incidents impacting the different products.
o Identify automation opportunities across operational tasks to improve efficiency and reduce manual workload.
o Collaborate with the cybersecurity team to integrate automated security enhancements into the products operations and infrastructure.
o Use insights from observability tools to optimize incident resolution times improve product performance and drive continuous improvement.
o Work closely with architects DevOps and engineering teams to improve product stability and reduce incidents through proactive solutions.
o Engage with the Central SRE team SIAM (Service & Integration Management) manager and the SRE Community of Practice to share best practices and leverage synergies.
Full Time