DescriptionYou will manage a talented team of data scientists and AI engineers driving the adoption of intelligent automation predictive analytics and proactive problem resolution across our complex IT landscape. This position requires a leader with a deep understanding of both data science principles and the intricacies of enterprise IT operations.
Responsibilities- Strategic Leadership: Define and execute the AIOps strategy and roadmap aligning it with overall IT and business objectives. Identify opportunities to leverage AI/ML for enhanced IT observability incident management performance optimization and automation.
- Team Management & Development: Lead mentor and grow a high-performing team of data scientists and AI engineers. Foster a culture of innovation continuous learning and technical excellence.
- Solution Design & Development: Oversee the end-to-end design development and deployment of AIOps solutions including anomaly detection predictive failure analysis root cause analysis intelligent alerting and automated remediation.
- Cross-Functional Collaboration: Partner closely with IT Operations Site Reliability Engineering (SRE) Network Engineering Application Development and other stakeholders to understand operational challenges and deliver impactful AI-driven solutions.
- Data & Platform Management: Ensure the availability quality and governance of operational data necessary for AI/ML model training and inference. Drive the selection integration and optimization of AIOps platforms and tools.
- Model Lifecycle Management: Establish robust MLOps practices for model development testing deployment monitoring and retraining to ensure the continuous effectiveness and reliability of AI models in production.
- Innovation & Research: Stay abreast of the latest advancements in AI/ML AIOps and IT operations. Drive research and experimentation to explore new techniques and technologies that can further enhance our operational intelligence.
- Performance & Metrics: Define key performance indicators (KPIs) for AIOps initiatives and regularly report on the impact and value delivered to the organization.
- Budget & Resource Management: Manage project budgets resources and timelines effectively to ensure successful delivery of AIOps programs.
QualificationsRequired Qualifications:
- Bachelors or Masters degree in Computer Science Data Science Artificial Intelligence Engineering or a related quantitative field.
- 10 years of progressive experience in data science machine learning and/or AI engineering.
- 5 years of experience in a leadership or management role leading technical teams focused on data science or AI.
- Proven experience in designing developing and deploying AI/ML models for real-world applications particularly within IT operations or related domains (e.g. observability security infrastructure management).
- Strong understanding of IT operations concepts including monitoring alerting incident management change management and IT service management (ITSM).
- Proficiency in programming languages commonly used in data science and AI (e.g. Python Scala Java).
- Hands-on experience with big data technologies (e.g. Spark Hadoop Kafka) and cloud platforms (AWS Azure GCP).
- Solid grasp of machine learning algorithms (e.g. supervised unsupervised deep learning) and statistical modeling.
- Excellent communication interpersonal and leadership skills with the ability to articulate complex technical concepts to non-technical stakeholders.
- Demonstrated ability to drive strategic initiatives manage complex projects and deliver results in a fast-paced environment.
Preferred Qualifications:
- Experience with specific AIOps platforms or tools (e.g. Splunk Dynatrace Moogsoft PagerDuty ServiceNow Datadog ELK stack).
- Familiarity with IT service management frameworks (e.g. ITIL).
- Experience with containerization (Docker Kubernetes) and microservices architectures.
- Knowledge of MLOps best practices and tools for automating and managing the ML lifecycle.
- Experience in a large-scale enterprise environment with diverse and complex IT infrastructure.