drjobs Head of Global Incident Management

Head of Global Incident Management

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Chicago, IL - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companiesfrom the worlds largest enterprises to the most ambitious startupsuse Stripe to accept payments grow their revenue and accelerate new business opportunities. Our mission is to increase the GDP of the internet and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyones reach while doing the most important work of your career.

About the team

The Incident Ops team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way working handinhand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution marshaling the necessary crossfunctional resources to respond to and resolve service outages critical bugs security attacks and anything that significantly impacts the users of our products. The team is userfirst and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is skilled in communications incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.

What youll do

This position entails leading and optimizing Stripes incident management processes and automation ensuring efficiency and adherence to stringent incident response metrics. As the head of the incident response team you will establish and maintain a bestinclass incident response framework upholding the reliability standards expected of Stripe. Responsibilities include but are not limited to incident classification escalation and notification management along with accountability for key incident response metrics (TTx). You will generate actionable insights to drive continuous improvement collaborating with engineering leadership to refine incident detection response user communication and tooling efficacy. Leadership and development of a highly effective 24/7 global incident response management team characterized by urgency programmatic ownership of incidents and communications and the capacity to engage engineering teams are crucial. Additionally you will manage incident communications across multiple channels for executive and enduser audiences and identify automation opportunities to streamline incident response workflows thereby safeguarding users and minimizing disruption to their operations.

Responsibilities

  • Lead the global 24/7 team of regional managers and incident response managers with ability to be handson and support frontline oncall with speed crossfunctional collaboration and escalation
  • Develop and own Stripes incident response and management strategy and crossfunctional roadmap ensuring it aligns with the companys reputation for reliability.
  • Spearhead and manage Stripes AIFirst strategy for automation of incident response workflows partnering with the engineering team to implement required tooling enhancements.
  • Enhance Stripes incident response by leading and implementing improvements derived from analyzing userfacing incidents and extracting actionable insights and learnings.
  • Collaborate closely with executive leadership engineering and operations teams to lead significant programs and reshape workflows and metrics concerning reliability and incident operations.
  • Manage relevant TTx metrics particularly those related to communication and escalation. Collaborate with engineering leadership to implement necessary improvements for each metric.
  • Develop userfocused metrics and data to guide Stripes incident response reliability strategy and user communications (including RCAs) ensuring impactful decisionmaking.

Who you are

Were looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements you are encouraged to apply. The preferred qualifications are a bonus not a requirement.

Minimum requirements

  • 10 years of management experience including 4 years of experience managing managers with a proven record in building growing and transforming teams.
  • Extensive experience (8 years) leading incident response for complex largescale distributed services with high SLOs/SLAs coupled with deep expertise in crisis management.
  • Demonstrated ability to lead influence other leaders and deliver complex strategic projects involving multiple stakeholders
  • Strong analytical skills and the ability to use data to drive business decisions
  • Possesses proficiency in basic incident troubleshooting and a reasonable understanding of system architecture. Fluent in using SQL Splunk or similar query languages.
  • Exceptional communication abilities capable of adapting incident updates for diverse audiences (executives external users internal teams).
  • Affinity for a fast paced work environment crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
  • Comfort navigating ambiguity while identifying areas for process improvement and establishing best practices

Preferred qualifications

  • Experience managing geographically dispersed teams
  • Experience using infrastructure and application monitoring tools such as Prometheus Sentry and others
  • Experience in incident response at a highgrowth technology company preferably within the payments or ecommerce sectors.
  • Proven ability to apply Agentic and Generative AI to revolutionize incident response coupled with a strong grasp of current industry trends in the incident response domain.
  • Demonstrated history of driving engineering and process enhancements to improve incident response efficiency within a rapidly expanding technology organization.

Required Experience:

Exec

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.