Major Incident Commander
Job Summary
We are seeking a highly accomplished Principal Incident Commander / Director Incident Management to lead enterprise-wide response to critical incidents across complex large-scale and globally distributed infrastructure environments.
This role operates at the intersection of technology leadership crisis management and business continuity requiring the ability to make high-stakes decisions influence senior stakeholders and drive rapid resolution during mission-critical outages. The individual will serve as the ultimate authority during major incidents ensuring minimal business disruption and long-term resilience.
Requirements
Strategic Responsibilities
- Own and lead enterprise-level incident management strategy across global operations.
- Act as the executive Incident Commander for P0/P1 incidents impacting business-critical systems.
- Establish and drive incident governance frameworks SLAs and response protocols
- Lead cross-functional crisis response involving Network Cloud Infrastructure Security and Field Operations
- Influence and align with C-suite and senior leadership during high-impact incidents
- Drive business continuity and service resilience initiatives
Operational Leadership
- Command and orchestrate war rooms and global bridge calls with multiple stakeholders
- Serve as the highest escalation point for critical outages and service disruptions.
- Ensure rapid triage containment and resolution of incidents with minimal downtime
- Drive real-time decision-making under ambiguity and pressure
- Oversee post-incident reviews and enforce accountability across teams
Technical Expertise
- Deep expertise in enterprise networking and distributed systems:
- BGP OSPF EIGRP TCP/IP QoS
- WAN SD-WAN Data Center architectures (Spine-Leaf)
- BGP OSPF EIGRP TCP/IP QoS
- Strong understanding of:
- Load balancing DNS DHCP Network Security
- Latency packet loss and performance optimization
- Load balancing DNS DHCP Network Security
- Familiarity with cloud platforms and hybrid infrastructure environments
- Ability to engage in hands-on technical triage when required
- Lead Root Cause Analysis (RCA) at an organizational level
- Drive preventive engineering automation and process maturity
- Establish a culture of proactive monitoring and early detection
- Enhance incident response playbooks runbooks and training programs
Preferred Qualifications
- ITIL Expert / Advanced Incident Management certifications
- Exposure to Disaster Recovery (DR) & Business Continuity Planning (BCP)
- Experience with automation observability platforms and AI-driven monitoring
- Track record of driving transformation in incident management practices
- 1218 years of experience in Network Engineering SRE NOC or Cloud Operations
- Proven experience handling enterprise-scale high-impact incidents globally
- Prior experience in large enterprises / telecom / hyperscalers / global tech organizations
- Strong leadership presence with the ability to influence without authority
- Experience working in 24x7 mission-critical environments
Benefits
Required Skills:
Bachelors degree in Business Administration HR or a related field. 2 years of experience in office administration or a similar role. Strong organizational and multitasking skills. Excellent verbal and written communication abilities. Proficient in MS Office (Word Excel Outlook). Familiarity with insurance coordination and asset management systems is a plus.
Required Education:
Bachelors degree