DescriptionThis is a unique opportunity to lead a key part of OCIs Observability stack focused on Telemetry Monitoring and Alarming systems which are essential to ensuring the performance availability and trustworthiness of all Oracle Cloud services. Our mission is to deliver a world-class Integrated Observability and Management platform that seamlessly supports OCI hybrid and multi-cloud environments.
Our platform combines Monitoring Alarming Logging Events Auditing and SIEM capabilities to give customers and internal teams a unified actionable view into their infrastructure and applications. This role specifically focuses on the Monitoring and Alarming platform which provides the foundation for real-time metric ingestion scalable alerting incident detection and proactive canary-based health verification of services.
We are looking for a Senior Engineering Manager to lead an exceptionally talented team of software engineers in advancing this critical part of OCIs platform. You will drive innovation and scale to ensure our Telemetry systems remain among the most reliable performant and intelligent in the modern cloud landscape.
Responsibilities- Own the design development and operation of a high-scale distributed telemetry platform that processes billions of datapoints and petabytes of time-series data across OCI regions.
- Ensure the reliability availability and operational excellence of services responsible for Monitoring Alarming and Canary-based health checks supporting mission-critical infrastructure.
- Provide technical leadership direction and strategic vision for a team of senior and principal engineers fostering a culture of innovation accountability and continuous improvement.
- Define and execute a clear prioritized roadmap of features platform investments and operational improvements delivering on commitments on time and with high quality.
- Collaborate cross-functionally with Product Management other OCI service teams and Oracle-wide stakeholders to align goals manage dependencies and drive integrated solutions.
- Drive and mature engineering processes including design reviews operational readiness reviews quality standards and incident postmortems.
- Represent the team in executive-level updates and strategic planning discussions articulating technical direction risks and delivery status.
- Proactively monitor the health and performance of services in the global OCI fleet identifying trends mitigating risks and ensuring fault-tolerant scalable telemetry infrastructure.
QualificationsCareer Level - M3
Required Experience:
Manager
DescriptionThis is a unique opportunity to lead a key part of OCIs Observability stack focused on Telemetry Monitoring and Alarming systems which are essential to ensuring the performance availability and trustworthiness of all Oracle Cloud services. Our mission is to deliver a world-class Integrate...
DescriptionThis is a unique opportunity to lead a key part of OCIs Observability stack focused on Telemetry Monitoring and Alarming systems which are essential to ensuring the performance availability and trustworthiness of all Oracle Cloud services. Our mission is to deliver a world-class Integrated Observability and Management platform that seamlessly supports OCI hybrid and multi-cloud environments.
Our platform combines Monitoring Alarming Logging Events Auditing and SIEM capabilities to give customers and internal teams a unified actionable view into their infrastructure and applications. This role specifically focuses on the Monitoring and Alarming platform which provides the foundation for real-time metric ingestion scalable alerting incident detection and proactive canary-based health verification of services.
We are looking for a Senior Engineering Manager to lead an exceptionally talented team of software engineers in advancing this critical part of OCIs platform. You will drive innovation and scale to ensure our Telemetry systems remain among the most reliable performant and intelligent in the modern cloud landscape.
Responsibilities- Own the design development and operation of a high-scale distributed telemetry platform that processes billions of datapoints and petabytes of time-series data across OCI regions.
- Ensure the reliability availability and operational excellence of services responsible for Monitoring Alarming and Canary-based health checks supporting mission-critical infrastructure.
- Provide technical leadership direction and strategic vision for a team of senior and principal engineers fostering a culture of innovation accountability and continuous improvement.
- Define and execute a clear prioritized roadmap of features platform investments and operational improvements delivering on commitments on time and with high quality.
- Collaborate cross-functionally with Product Management other OCI service teams and Oracle-wide stakeholders to align goals manage dependencies and drive integrated solutions.
- Drive and mature engineering processes including design reviews operational readiness reviews quality standards and incident postmortems.
- Represent the team in executive-level updates and strategic planning discussions articulating technical direction risks and delivery status.
- Proactively monitor the health and performance of services in the global OCI fleet identifying trends mitigating risks and ensuring fault-tolerant scalable telemetry infrastructure.
QualificationsCareer Level - M3
Required Experience:
Manager
View more
View less