We are looking for a proactive and detail-oriented AI OPS Engineer to support the deployment monitoring and maintenance of AI/ML models in production. Reporting to the AI Developer this role will focus on MLOps practices including model versioning CI/CD observability and performance optimization in cloud and hybrid environments.
Key Responsibilities:
- Build and manage CI/CD pipelines for ML models using platforms like MLflow Kubeflow or SageMaker.
- Monitor model performance and health using observability tools and dashboards.
- Ensure automated retraining version control rollback strategies and audit logging for production models.
- Support deployment of LLMs RAG pipelines and agentic AI systems in scalable containerized environments.
- Collaborate with AI Developers and Architects to ensure reliable and secure integration of models into enterprise systems.
- Troubleshoot runtime issues latency and accuracy drift in model predictions and APIs.
- Contribute to infrastructure automation using Terraform Docker Kubernetes or similar technologies.
Qualifications :
Required Qualifications:
- 35 years of experience in DevOps MLOps or platform engineering roles with exposure to AI/ML workflows.
- Hands-on experience with deployment tools like Jenkins Argo GitHub Actions or Azure DevOps.
- Strong scripting skills (Python Bash) and familiarity with cloud environments (AWS Azure GCP).
- Understanding of containerization service orchestration and monitoring tools (Prometheus Grafana ELK).
- Bachelors degree in computer science IT or a related field.
Preferred Skills:
- Experience supporting GenAI or LLM applications in production.
- Familiarity with vector databases model registries and feature stores.
- Exposure to security and compliance standards in model lifecycle management
Remote Work :
No
Employment Type :
Full-time