Incident & Request Manager - Non-Production Environments (Onsite)

Sumeru Solutions

Not Interested
Bookmark
Report This Job

profile Job Location:

Atlanta, GA - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy
The job posting is outdated and position may be filled

Job Summary

Job Title: Incident & Request Manager

Location: Atlanta GA or Bellevue WA Locals Only

Role Overview

  • The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev QA UAT Performance).
  • Acting as the escalation point for project/product delivery teams this role ensures incidents are resolved quickly requests are fulfilled efficiently and learnings are embedded into continuous improvement.
  • The Incident Manager directly manages a team of Incident Analysts and SREs partners with DevOps teams to automate detection and response and works closely with Environment and Change Managers to reduce recurrence of issues.

Key Responsibilities Incident Management

  • Own the incident lifecycle: detection triage response resolution and closure.
  • Act as the primary escalation point for project/product delivery teams during NPE incidents.
  • Lead war rooms for critical incidents coordinating with technical and delivery stakeholders.
  • Ensure timely escalation to Environment Change DevOps Infra and Security teams when required.
  • Track and improve incident SLAs (MTTR MTTD availability SLOs). Request Management
  • Own request fulfilment for project/product delivery teams (e.g. access entitlements environment service requests).
  • Standardize and automate common request types in collaboration with Intake and DevOps teams.
  • Ensure requests are logged prioritized and fulfilled within SLA.
  • Provide transparency to stakeholders on request status. Team Leadership
  • Manage and mentor Incident Analysts and SREs.
  • Ensure follow-the-sun coverage via offshore/onshore teams.
  • Build a culture of blameless incident management automation-first practices and continuous learning. Governance & RCA
  • Ensure all incidents have documented Root Cause Analysis (RCA).
  • Track corrective and preventive actions and feed them into Change and Environment management processes.
  • Provide trend reporting and insights to leadership. SRE & DevOps Alignment
  • Work with SREs and DevOps teams to automate incident detection rollback and recovery.
  • Integrate observability tools (Splunk Prometheus Grafana) into proactive monitoring. Stakeholder Communication
  • Provide timely updates during incidents and delays in request fulfilment.
  • Publish regular reports on incident trends RCA outcomes and SLA adherence.
  • Maintain trust with project/product delivery teams by ensuring transparent communication.

Required Skills & Experience

  • 8-10 years in Incident Management Service Operations or SRE leadership.
  • Experience managing Incident Analysts and SRE teams.
  • Strong knowledge of AWS Kubernetes CI/CD pipelines and observability tools (Splunk Prometheus Grafana).
  • Deep understanding of ITIL Incident Problem and Request Management processes.
  • Excellent crisis management communication and stakeholder engagement skills
Job Title: Incident & Request Manager Location: Atlanta GA or Bellevue WA Locals Only Role Overview The Incident & Request Manager leads the incident response and request management function for all non-production environments (Dev QA UAT Performance). Acting as the escalation point for projec...
View more view more

Key Skills

  • Experience Working With Students
  • Google Docs
  • Organizational skills
  • Classroom Experience
  • Data Collection
  • Materials Handling
  • Workers' Compensation Law
  • OSHA
  • Special Operations
  • Team Management
  • Experience with Children
  • Supervising Experience