Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailWe are seeking a highly skilled and proactive Problem Management Engineer to join our IT Infrastructure team. This role is responsible for identifying analyzing and resolving the root causes of recurring incidents and problems across infrastructure cloud compute and network environments. This role is pivotal in ensuring the stability reliability and performance of our IT infrastructure by proactively identifying and eliminating the root causes of incidents. The ideal candidate will work closely with cross-functional teams to drive service stability reduce incident volumes and improve overall IT service quality
Own and manage the end-to-end Problem Management lifecycle in accordance with ITIL best and manage the Problem Management lifecycle (detection logging diagnosis resolution and closure).Perform root cause analysis (RCA) for critical and recurring incidents across infrastructure cloud compute and network with Incident Management Change Management and Service Owners to ensure timely resolution and prevention of and improve the Known Error Database (KEDB) and ensure knowledge sharing across trends and metrics to proactively identify potential issues and areas for post-incident reviews (PIRs) and ensure action items are tracked to to the development of automation and monitoring strategies to prevent problem compliance with ITIL best practices and internal governance and implement preventative measures to reduce incident recurrence and improve service as a liaison between technical teams and business stakeholders to communicate problem status impact and resolution detailed RCA reports dashboards and executive summaries for leadership and audit and report on problem management KPIs such as MTTR (Mean Time to Resolve) recurrence rate and problem all problem records are accurately documented in the ITSM tool (e.g. ServiceNow).English (fluent) Oral & written requiredExpertise in Microsoft Office: Outlook Word Excel PowerPoint.
MandatoryMinimum 5 years of experience in IT operations infrastructure support or cloud services with a focus on Problem knowledge of ITIL v3/v4 framework (ITIL certification preferred).Proven experience in managing problems across:Cloud platforms (AWS Azure GCP)Compute and virtualization (VMware Hyper-V)Networking (LAN/WAN firewalls load balancers DNS VPN)Storage and backup systemsFamiliarity with monitoring and observability tools (e.g. Site24x7 Grafana).Experience with ITSM platforms (e.g. ServiceNow BMC Helix).Strong analytical and troubleshooting skills with a methodical approach to communication documentation and stakeholder management skills.
Remote Work :
No
Employment Type :
Full-time
Full-time