AI Engineer AIOps / IT Infrastructure ML
Role Overview
The AI Engineer Intelligent Operations is responsible for designing and implementing production-ready AI and machine learning (AI/ML) solutions that enhance automation optimize processes and provide predictive capabilities within IT infrastructure and operations. The primary focus is integrating AI into IT operations workflows and collaborating with teams specializing in observability IT service management (ITSM) cloud and infrastructure.
Core Responsibilities
AI and AIOps Engineering
- Develop machine learning and AI models addressing key operational areas:
- Incident prediction and early warning systems to prevent downtime.
- Event correlation and noise reduction to streamline alerts.
- Capacity forecasting and anomaly detection for optimal resource utilization.
- Shift IT operations from reactive analytics to proactive predictive operations.
Integration with IT Operations Platforms
- Integrate AI models with IT operations platforms including:
- Observability systems such as Dynatrace (logs metrics traces).
- ITSM platforms particularly ServiceNow to improve incident and problem management workflows.
- Automation tools to enable intelligent operational actions and remediation pipelines.
MLOps for Infrastructure
- Deploy models as APIs microservices or batch jobs to support operations teams.
- Implement full MLOps lifecycle: model versioning CI/CD drift monitoring explainability and reliability.
- Ensure models are production-ready compliant stable and auditable especially in banking environments.
Cloud and Platform Engineering
- Execute AI workloads on leading cloud platforms: Azure AWS GCP.
- Optimize compute storage and inference costs for 247 operational environments.
Required Skills
- Proficiency in Python and machine learning engineering.
- Expertise in time-series analysis anomaly detection and forecasting.
- Hands-on experience with MLOps practices and tools.
- Familiarity with infrastructure telemetry including logs metrics and events.
- Experience with cloud platforms and containerized environments (preferably Kubernetes).
Preferred Attributes
- Strong collaboration with IT operations observability and automation teams.
- Experience in enterprise-scale IT infrastructure particularly in banking or financial services.
- Ability to translate AI/ML insights into actionable operational improvements.
AI Engineer AIOps / IT Infrastructure MLRole OverviewThe AI Engineer Intelligent Operations is responsible for designing and implementing production-ready AI and machine learning (AI/ML) solutions that enhance automation optimize processes and provide predictive capabilities within IT infrastructu...
AI Engineer AIOps / IT Infrastructure ML
Role Overview
The AI Engineer Intelligent Operations is responsible for designing and implementing production-ready AI and machine learning (AI/ML) solutions that enhance automation optimize processes and provide predictive capabilities within IT infrastructure and operations. The primary focus is integrating AI into IT operations workflows and collaborating with teams specializing in observability IT service management (ITSM) cloud and infrastructure.
Core Responsibilities
AI and AIOps Engineering
- Develop machine learning and AI models addressing key operational areas:
- Incident prediction and early warning systems to prevent downtime.
- Event correlation and noise reduction to streamline alerts.
- Capacity forecasting and anomaly detection for optimal resource utilization.
- Shift IT operations from reactive analytics to proactive predictive operations.
Integration with IT Operations Platforms
- Integrate AI models with IT operations platforms including:
- Observability systems such as Dynatrace (logs metrics traces).
- ITSM platforms particularly ServiceNow to improve incident and problem management workflows.
- Automation tools to enable intelligent operational actions and remediation pipelines.
MLOps for Infrastructure
- Deploy models as APIs microservices or batch jobs to support operations teams.
- Implement full MLOps lifecycle: model versioning CI/CD drift monitoring explainability and reliability.
- Ensure models are production-ready compliant stable and auditable especially in banking environments.
Cloud and Platform Engineering
- Execute AI workloads on leading cloud platforms: Azure AWS GCP.
- Optimize compute storage and inference costs for 247 operational environments.
Required Skills
- Proficiency in Python and machine learning engineering.
- Expertise in time-series analysis anomaly detection and forecasting.
- Hands-on experience with MLOps practices and tools.
- Familiarity with infrastructure telemetry including logs metrics and events.
- Experience with cloud platforms and containerized environments (preferably Kubernetes).
Preferred Attributes
- Strong collaboration with IT operations observability and automation teams.
- Experience in enterprise-scale IT infrastructure particularly in banking or financial services.
- Ability to translate AI/ML insights into actionable operational improvements.
View more
View less