Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position: Incident & Request Manager Non-Production Environments
Location: Atlanta GA / Bellevue WA
Duration: 6 Months
Job Type: Temporary Assignment
Work Type: Onsite
Job Description:
- The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev QA UAT Performance).
- Acting as the escalation point for project/product delivery teams this role ensures incidents are resolved quickly requests are fulfilled efficiently and learnings are embedded into continuous improvement.
- The Incident Manager directly manages a team of Incident Analysts and SREs partners with DevOps teams to automate detection and response and works closely with Environment and Change Managers to reduce recurrence of issues.
Key Responsibilities:
Incident Management:
- Own the incident lifecycle: detection triage response resolution and closure.
- Act as the primary escalation point for project/product delivery teams during NPE incidents.
- Lead war rooms for critical incidents coordinating with technical and delivery stakeholders.
- Ensure timely escalation to Environment Change DevOps Infra and Security teams when required.
- Track and improve incident SLAs (MTTR MTTD availability SLOs).
Request Management:
- Own request fulfilment for project/product delivery teams (e.g. access entitlements environment service requests).
- Standardize and automate common request types in collaboration with Intake and DevOps teams.
- Ensure requests are logged prioritized and fulfilled within SLA.
- Provide transparency to stakeholders on request status.
Team Leadership:
- Manage and mentor Incident Analysts and SREs.
- Ensure follow-the-sun coverage via offshore/onshore teams.
- Build a culture of blameless incident management automation-first practices and continuous learning.
Governance & RCA:
- Ensure all incidents have documented Root Cause Analysis (RCA).
- Track corrective and preventive actions and feed them into Change and Environment management processes.
- Provide trend reporting and insights to leadership.
SRE & DevOps Alignment:
- Work with SREs and DevOps teams to automate incident detection rollback and recovery.
- Integrate observability tools (Splunk Prometheus Grafana) into proactive monitoring.
- Stakeholder Communication:
- Provide timely updates during incidents and delays in request fulfilment.
- Publish regular reports on incident trends RCA outcomes and SLA adherence.
- Maintain trust with project/product delivery teams by ensuring transparent communication.
Required Skills & Experience:
- 8 10 years in Incident Management Service Operations or SRE leadership.
- Experience managing Incident Analysts and SRE teams.
- Strong knowledge of AWS Kubernetes CI/CD pipelines and observability tools (Splunk Prometheus Grafana).
- Deep understanding of ITIL Incident Problem and Request Management processes.
- Excellent crisis management communication and stakeholder engagement skills.
TekWissen Group is an equal opportunity employer supporting workforce diversity.