Due to Federal Government contract requirement U.S. Citizenship is required for this position.
FedRamp Staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne.
What are we Looking for
As a Staff Incident and Escalation Manager you will lead the response to high-severity production incidents that impact customers and mission-critical services. Operating in a fast-paced cloud-native environment with globally distributed teams you will act as the central point of coordination during major incidents ensure timely resolution maintain clear communication and drive long-term process improvements. This is a high-impact role with visibility across the organization and a direct influence on customer trust and platform reliability.
What will you do
- Serve as the primary incident commander for high-severity incidents across our production environment.
- Coordinate real-time troubleshooting efforts across globally distributed engineering and operations teams.
- Provide timely accurate updates to stakeholders customers (as needed) and executive leadership.
- Determine appropriate escalation paths and timing to drive fast resolution.
Collaborate across Engineering SRE Product and Support functions to ensure rapid alignment and resource mobilization.
- Facilitate post-incident reviews for high severity events ensuring root cause analysis and comprehensive documentation.
- Promote a blameless culture and drive learning-focused retrospectives.
- Ensure follow-up action items are clearly defined assigned and completed on schedule.
- Maintain visibility into resolution progress and escalate blockers as needed.
- Enhance incident response practices through process development tooling improvements and knowledge sharing.
- Partner with global SRE and Engineering teams to improve observability monitoring alerting and runbook quality.
- Participate in a rotating on-call schedule as a designated incident commander for major incidents.
- Be available during your on-call shift to lead incident calls coordinate cross-functional teams and drive resolution.
Ensure smooth handoffs between on-call rotations and maintain accurate status documentation. - On-call participation is shared equitably across the team and supported with clear escalation protocols and backup coverage.
What Skills and Experience Will You Need
- 5 years of experience in incident management SRE or operations roles within SaaS or cloud-native environments.
- Demonstrated ability to lead complex high-severity incident response efforts across global teams.
- Strong communication and leadership skills with the ability to stay composed under pressure.
- Experience with observability and incident tooling (e.g. PagerDuty Opsgenie Datadog Splunk Jira).
- Deep understanding of service reliability principles escalation strategies and root cause analysis methodologies.
- Comfortable working across time zones and in a fast-paced evolving environment.
Why Us
- You will be joining a cutting-edge company where you will tackle extraordinary challenges and work with the very best in the industry
- Medical Vision Dental 401(k) Commuter Health and Dependent FSA
- Unlimited PTO
- Industry-leading gender-neutral parental leave
- Paid company holidays
- Paid sick time
- Employee stock purchase program
- Disability and life insurance
- Employee assistance program
- Gym membership reimbursement
- Cell phone reimbursement
- Numerous company-sponsored events including regular happy hours and team-building events
Required Experience:
Manager