Role Summary
The Incident Commander is responsible for leading the end-to-end management of major production incidents across Telecom digital platforms including Digital Commerce Order Management Payments Mobile/Web Applications and Customer Data systems. This role operates within a 24x7 centralized operations model and serves as the single point of command during high-severity incidents ensuring rapid stabilization clear decision-making and effective stakeholder coordination.
The Incident Commander combines strong technical depth with exceptional communication skills to lead large cross-functional teams under pressure minimize customer and business impact and restore services within defined SLAs.
Key Responsibilities
Major Incident Management & Command
Act as the Incident Commander for Sev-1 and Sev-2 incidents across Telecom digital platforms.
Own the incident lifecycle from detection through stabilization resolution and post-incident review.
Lead incident bridge calls with a large number of technical business and executive stakeholders.
Establish command-and-control during incidents driving focus accountability and rapid decision-making.
Ensure accurate impact assessment and prioritization based on customer revenue and regulatory impact.
Telecom Digital Platform Expertise
Lead incident response across:
o Digital Commerce platforms (customer acquisition checkout promotions)
o Order Management and fulfillment systems
o Payments billing integrations and financial transaction flows
o Mobile and web applications
o Customer Information and data management platforms
Quickly understand complex distributed system interactions and failure modes.
Provide technical direction and guidance during root cause identification and remediation.
Centralized Operations & 24x7 Support
Operate within a centralized 24x7 operations model supporting mission-critical digital platforms.
Coordinate across global onshore and offshore support teams SREs engineering infrastructure and vendors.
Ensure adherence to incident response SLAs escalation paths and operational runbooks.
Drive continuous improvement of incident response processes and tooling.
Stakeholder & Executive Communication
Serve as the single authoritative voice during incidents for internal and external stakeholders.
Communicate incident status impact mitigation steps and ETAs clearly and concisely.
Manage executive-level updates and ensure consistent messaging across all forums.
Handle high-pressure situations with confidence clarity and professionalism.
Technical Leadership & Problem Solving
Lead technical troubleshooting efforts without necessarily being hands-on in code.
Challenge assumptions validate hypotheses and drive teams toward data-driven resolution paths.
Ensure effective use of monitoring logging and observability tools.
Balance speed of recovery with risk customer impact and system integrity.
Post-Incident Review & Prevention
Facilitate post-incident reviews (PIRs / RCAs) with engineering and operations teams.
Ensure root causes are clearly identified and corrective actions are defined and tracked.
Identify systemic issues and recommend long-term preventive measures.
Drive improvements in platform resilience monitoring automation and operational readiness.
Role Summary The Incident Commander is responsible for leading the end-to-end management of major production incidents across Telecom digital platforms including Digital Commerce Order Management Payments Mobile/Web Applications and Customer Data systems. This role operates within a 24x7 centralized...
Role Summary
The Incident Commander is responsible for leading the end-to-end management of major production incidents across Telecom digital platforms including Digital Commerce Order Management Payments Mobile/Web Applications and Customer Data systems. This role operates within a 24x7 centralized operations model and serves as the single point of command during high-severity incidents ensuring rapid stabilization clear decision-making and effective stakeholder coordination.
The Incident Commander combines strong technical depth with exceptional communication skills to lead large cross-functional teams under pressure minimize customer and business impact and restore services within defined SLAs.
Key Responsibilities
Major Incident Management & Command
Act as the Incident Commander for Sev-1 and Sev-2 incidents across Telecom digital platforms.
Own the incident lifecycle from detection through stabilization resolution and post-incident review.
Lead incident bridge calls with a large number of technical business and executive stakeholders.
Establish command-and-control during incidents driving focus accountability and rapid decision-making.
Ensure accurate impact assessment and prioritization based on customer revenue and regulatory impact.
Telecom Digital Platform Expertise
Lead incident response across:
o Digital Commerce platforms (customer acquisition checkout promotions)
o Order Management and fulfillment systems
o Payments billing integrations and financial transaction flows
o Mobile and web applications
o Customer Information and data management platforms
Quickly understand complex distributed system interactions and failure modes.
Provide technical direction and guidance during root cause identification and remediation.
Centralized Operations & 24x7 Support
Operate within a centralized 24x7 operations model supporting mission-critical digital platforms.
Coordinate across global onshore and offshore support teams SREs engineering infrastructure and vendors.
Ensure adherence to incident response SLAs escalation paths and operational runbooks.
Drive continuous improvement of incident response processes and tooling.
Stakeholder & Executive Communication
Serve as the single authoritative voice during incidents for internal and external stakeholders.
Communicate incident status impact mitigation steps and ETAs clearly and concisely.
Manage executive-level updates and ensure consistent messaging across all forums.
Handle high-pressure situations with confidence clarity and professionalism.
Technical Leadership & Problem Solving
Lead technical troubleshooting efforts without necessarily being hands-on in code.
Challenge assumptions validate hypotheses and drive teams toward data-driven resolution paths.
Ensure effective use of monitoring logging and observability tools.
Balance speed of recovery with risk customer impact and system integrity.
Post-Incident Review & Prevention
Facilitate post-incident reviews (PIRs / RCAs) with engineering and operations teams.
Ensure root causes are clearly identified and corrective actions are defined and tracked.
Identify systemic issues and recommend long-term preventive measures.
Drive improvements in platform resilience monitoring automation and operational readiness.
View more
View less