Job Responsibilities
Years of Experience: 3-5 Yrs
Responsibilities:
AI Model Deployment & Integration:
- Deploy and manage AI/ML models including traditional machine learning and GenAI solutions (e.g. LLMs RAG systems).
- Implement automated CI/CD pipelines for seamless deployment and scaling of AI models.
- Ensure efficient model integration into existing enterprise applications and workflows in collaboration with AI Engineers.
- Optimize AI infrastructure for performance and cost efficiency in cloud environments (AWS Azure GCP).
Monitoring & Performance Management:
- Develop and implement monitoring solutions to track model performance latency drift and cost metrics.
- Set up alerts and automated workflows to manage performance degradation and retraining triggers.
- Ensure responsible AI by monitoring for issues such as bias hallucinations and security vulnerabilities in GenAI outputs.
- Collaborate with Data Scientists to establish feedback loops for continuous model improvement.
Automation & MLOps Best Practices:
- Establish scalable MLOps practices to support the continuous deployment and maintenance of AI models.
- Automate model retraining versioning and rollback strategies to ensure reliability and compliance.
- Utilize infrastructure-as-code (Terraform CloudFormation) to manage AI pipelines.
Security & Compliance:
- Implement security measures to prevent prompt injections data leakage and unauthorized model access.
- Work closely with compliance teams to ensure AI solutions adhere to privacy and regulatory standards (HIPAA GDPR).
- Regularly audit AI pipelines for ethical AI practices and data governance.
Collaboration & Process Improvement:
- Work closely with AI Engineers Product Managers and IT teams to align AI operational processes with business needs.
- Contribute to the development of AI Ops documentation playbooks and best practices.
- Continuously evaluate emerging GenAI operational tools and processes to drive innovation.
Skills/Qualifications:
Education:
- Bachelors or Masters degree in Computer Science Data Engineering AI or a related field.
- Relevant certifications in cloud platforms (AWS Azure GCP) or MLOps frameworks are a plus.
Experience:
- 3 years of experience in AI/ML operations MLOps or DevOps for AI-driven solutions.
- Hands-on experience deploying and managing AI models including LLMs and GenAI solutions in production environments.
- Experience working with cloud AI platforms such as Azure AI AWS SageMaker or Google Vertex AI.
Technical Skills:
- Proficiency in MLOps tools and frameworks such as MLflow Kubeflow or Airflow.
- Hands-on experience with monitoring tools (Prometheus Grafana ELK Stack) for AI performance tracking.
- Experience with containerization and orchestration tools (Docker Kubernetes) to support AI workloads.
- Familiarity with automation scripting using Python Bash or PowerShell.
- Understanding of GenAI-specific operational challenges such as response monitoring token management and prompt optimization.
- Knowledge of CI/CD pipelines (Jenkins GitHub Actions) for AI model deployment.
- Strong understanding of AI security principles including data privacy and governance considerations.
Soft Skills:
- Strong problem-solving skills with the ability to troubleshoot complex AI operational issues.
- Excellent communication skills to effectively collaborate with cross-functional stakeholders.
- Proactive and results-driven mindset with a focus on operational efficiency and scalability.
- Ability to work effectively in a fast-paced dynamic environment.