Its fun to work in a company where people truly BELIEVE in what theyre doing!
Job Description:
Job Summary
We are seeking a Director of Global Technical Services and Operations Management to lead and drive process maturity and operational excellence across our IT service management (ITSM) and IT operations management (ITOM) functions including Incident response event management and disaster recovery. This position will have primary responsibility for leading and overseeing ITSM and ITOM functions with additional responsibilities for 24x7 Monitoring Operations and (primarily in the EMEA and APAC Time Zones) coordinating all aspects of Technical Operations Management. Position will also be responsible for all aspects of Ingram Micros release management programs and processes.
The ideal candidate would have deep experience with ITIL and tools such as ServiceNow especially ITSM (including Change Incident and Problem management) ITOM and CMDB/Service Graph and Reporting ServiceNow integrations with other key tooling such as monitoring and observability tools (e.g. DataDog SolarWinds splunk dynatrace etc.) and experience working in a globally distributed 24x7 missioncritical environment such as SaaS or eCommerce.
This role will require strong management skills and experience managing IT functions and globally distributed teams comprised of both 3rd party and inhouse resources exceptional communication skills both written and verbal and a datadriven approach to managing performance using KPIs and driving oversight and governance to ensure seamless delivery of services and driving performance and accountability across partner teams and vendors ensuring our platform meets defined availability quality compliance and other performance objectives.
The ideal candidate would have thought leadership experience with AIOps and leading an automationcentric (such as autohealing and automating risk/change assessment) approach to driving continual process and operational excellence maturity and efficiency driving innovation improving system resiliency and optimizing Cloud and infrastructure operations.
Key Responsibilities
Strategic Leadership & Vision
- Define and execute the longterm platform engineering strategy aligning it with business objectives.
- Integrate DevOps SRE and ITSM/ITOM principles to create a unified and efficient operational model.
- Drive automation and selfservice capabilities to enhance developer productivity and system reliability.
- Ensure high availability and reliability for 24x7 global operations implementing best practices for service continuity.
Infrastructure DevOps & Automation
- Oversee cloud infrastructure (AWS Azure or GCP) container orchestration (Kubernetes Docker) and CI/CD pipelines.
- Implement and integrate AIOps solutions for proactive issue detection incident resolution and intelligent automation.
- Drive Infrastructure as Code (IaC) adoption using tools like Terraform and Ansible.
- Develop and execute strategies for cost optimization security and governance across cloud environments.
IT Operations Service Management & Observability
- Integrate ITSM/ITOM tools (e.g. ServiceNow) into DevOps and SRE workflows for automated incident management change management and service reliability.
- Enhance system visibility through observability and monitoring tools like Datadog Dynatrace New Relic and Splunk.
- Drive automationcentric service management to improve IT operations efficiency and reduce mean time to resolution (MTTR).
Technology & Architecture
- Architect and oversee resilient scalable and secure platform solutions incorporating AIOps machine learningdriven automation and eventdriven architectures.
- Implement APIfirst and integrationcentric approaches for seamless interoperability across IT and engineering ecosystems.
- Ensure the alignment of ITSM DevOps and cloudnative technologies to create a highly automated and efficient operational model.
Team Leadership & Collaboration
- Foster a culture of automation continuous improvement and operational excellence.
- Collaborate closely with security software engineering and product teams to streamline workflows and enhance service reliability.
- Ensure 24x7 operational excellence by implementing oncall rotations automated incident response and realtime monitoring.
Performance Reliability & Incident Management
- Implement SRE principles defining and tracking SLAs SLOs and error budgets to maintain system reliability.
- Develop and refine incident response root cause analysis and postmortem processes using ITSM/ITOM automation.
- Optimize service health incident response and operational resilience through proactive monitoring and analyticsdriven insights.
Qualifications & Experience
Required:
- 10 years of experience in software engineering cloud infrastructure or platform engineering.
- 5 years of leadership experience managing globally distributed platform SRE or DevOps teams in a 24x7 operational environment.
- Proven expertise integrating DevOps SRE and ITSM/ITOM to drive operational efficiency.
- Strong knowledge of cloud platforms (AWS Azure GCP) Kubernetes and microservices architecture.
- Experience with ServiceNow AIOps and IT automation tools to optimize IT operations.
- Handson expertise in CI/CD pipelines Infrastructure as Code (Terraform Ansible) and observability tools (Datadog Dynatrace New Relic Splunk).
- Strong background in automationcentric approaches to enhance selfhealing infrastructure and intelligent workflows.
- Experience implementing AIdriven monitoring predictive analytics and autoremediation solutions.
- Excellent verbal and written communication skills with the ability to present technical concepts to executive leadership and crossfunctional teams.
Preferred:
- Experience with eventdriven architecture and serverless computing.
- Knowledge of FinOps cloud cost optimization and security best practices.
- Prior experience in performance engineering security automation or AI/ML infrastructure.
Why Join Us
- Work with cuttingedge technologies AIdriven automation and cloudnative solutions.
- Competitive salary equity and comprehensive benefits.
- A culture of innovation collaboration and continuous improvement.
#LIHybrid #IngramMicroBulgaria #LIVA1
Required Experience:
Exec