Overview:
TekWissen is a global workforce management provider throughout India and many other countries in the world. The below job opportunity is to one of our clients who is a part of a trusted global innovator of IT and business services headquartered in Tokyo. We help clients transform through consulting industry solutions business process services IT modernization and managed services. This client enables us to move confidently into the digital future. This client committed to Long Term success and combine global reach with local client attention to serve them in over 50 Countries.
Position: Senior MLOps / AIOps Platform Engineer
Location: Chennai / Pune
Work Type: Hybrid
Job Type: Full Time
Job Description:
- We are seeking a Senior MLOps / AIOps Platform Engineer with deep DevSecOps expertise and hands-on experience managing enterprise-grade AI/ML platforms.
- This critical role focuses on building configuring and operationalizing secure scalable and reusable infrastructure and pipelines that support AI and ML initiatives across the enterprise.
- The ideal candidate will have a strong background in Infrastructure as Code (IaC) pipeline automation and platform engineering with specific experience configuring and maintaining IBM watsonx and Google Cloud Vertex AI environments.
Key Responsibilities
Platform Engineering & Operations
- Lead the provisioning configuration and ongoing support of IBM watsonx and Google Cloud Vertex AI platforms.
- Ensure platforms are production-ready secure cost-efficient and performant across training inference and orchestration workflows.
- Manage lifecycle tasks such as patching upgrades integrations and service reliability.
- Partner with security compliance and product teams to align platforms with enterprise and regulatory standards.
Enterprise MLOps / AIOps Enablement
- Define and implement standardized MLOps/AIOps practices across business units for consistency and scalability.
- Build and maintain reusable workflows for model development deployment retraining and monitoring.
- Provide onboarding enablement and support to AI/ML teams adopting enterprise platforms and tools.
- Support development/deployment of GenAI applications and maintain them at an Enterprise scale.
DevSecOps Integration
- Embed security and compliance guardrails across the ML lifecycle including CI/CD pipelines and IaC templates.
- Implement policy-as-code access controls vulnerability scanning and automated compliance checks.
- Ensure all deployments meet enterprise and regulatory requirements (HIPAA SOX FedRAMP etc.).
Infrastructure as Code & Automation
- Design and maintain IaC templates (Terraform Pulumi Ansible CloudFormation) for reproducible ML infrastructure.
- Build and optimize CI/CD pipelines for AI/ML assets including data pipelines training workflows deployment artifacts and monitoring systems.
- Enforce best practices around automation reusability and observability of infrastructure and workflows.
Monitoring Logging & Observability
- Implement comprehensive observability for AI/ML workloads using Prometheus Grafana Stackdriver or Datadog.
- Monitor both infrastructure health (CPU memory cost) and ML-specific metrics (model drift data integrity anomaly detection).
- Define KPIs and usage metrics to measure platform performance adoption and operational health.
Qualifications
Education
- Bachelors or Masters degree in Computer Science Engineering or a related technical field.
Experience
- 5 years in MLOps DevOps Platform Engineering or Infrastructure Engineering.
- 2 years applying DevSecOps practices (secure CI/CD vulnerability management policy enforcement).
- Hands-on experience configuring and managing enterprise AI/ML platforms (IBM watsonx Google Vertex AI).
- Demonstrated success in building and scaling ML infrastructure automation pipelines and platform support models.
Technical Skills
- Proficiency with IaC tools (Terraform Pulumi Ansible CloudFormation).
- Strong scripting skills in Python and Bash.
- Deep understanding of containerization and orchestration (Docker Kubernetes).
- Experience with model lifecycle tools (MLflow TFX Vertex Pipelines or equivalents).
- Familiarity with secrets management policy-as-code access control and monitoring tools.
- Working knowledge of data engineering concepts and their integration into ML pipelines.
Preferred
- Cloud certifications (e.g. GCP Professional ML Engineer AWS DevOps Engineer IBM Cloud AI Engineer).
- Experience supporting platforms in regulated industries (HIPAA FedRAMP SOX PCI-DSS).
- Contributions to open-source projects in MLOps automation or DevSecOps.
- Familiarity with responsible AI practices including governance fairness interpretability and explainability.
- Hands-on experience with enterprise feature stores model monitoring frameworks and fairness toolkits.
TekWissen Group is an equal opportunity employer supporting workforce diversity
Overview: TekWissen is a global workforce management provider throughout India and many other countries in the world. The below job opportunity is to one of our clients who is a part of a trusted global innovator of IT and business services headquartered in Tokyo. We help clients transform thro...
Overview:
TekWissen is a global workforce management provider throughout India and many other countries in the world. The below job opportunity is to one of our clients who is a part of a trusted global innovator of IT and business services headquartered in Tokyo. We help clients transform through consulting industry solutions business process services IT modernization and managed services. This client enables us to move confidently into the digital future. This client committed to Long Term success and combine global reach with local client attention to serve them in over 50 Countries.
Position: Senior MLOps / AIOps Platform Engineer
Location: Chennai / Pune
Work Type: Hybrid
Job Type: Full Time
Job Description:
- We are seeking a Senior MLOps / AIOps Platform Engineer with deep DevSecOps expertise and hands-on experience managing enterprise-grade AI/ML platforms.
- This critical role focuses on building configuring and operationalizing secure scalable and reusable infrastructure and pipelines that support AI and ML initiatives across the enterprise.
- The ideal candidate will have a strong background in Infrastructure as Code (IaC) pipeline automation and platform engineering with specific experience configuring and maintaining IBM watsonx and Google Cloud Vertex AI environments.
Key Responsibilities
Platform Engineering & Operations
- Lead the provisioning configuration and ongoing support of IBM watsonx and Google Cloud Vertex AI platforms.
- Ensure platforms are production-ready secure cost-efficient and performant across training inference and orchestration workflows.
- Manage lifecycle tasks such as patching upgrades integrations and service reliability.
- Partner with security compliance and product teams to align platforms with enterprise and regulatory standards.
Enterprise MLOps / AIOps Enablement
- Define and implement standardized MLOps/AIOps practices across business units for consistency and scalability.
- Build and maintain reusable workflows for model development deployment retraining and monitoring.
- Provide onboarding enablement and support to AI/ML teams adopting enterprise platforms and tools.
- Support development/deployment of GenAI applications and maintain them at an Enterprise scale.
DevSecOps Integration
- Embed security and compliance guardrails across the ML lifecycle including CI/CD pipelines and IaC templates.
- Implement policy-as-code access controls vulnerability scanning and automated compliance checks.
- Ensure all deployments meet enterprise and regulatory requirements (HIPAA SOX FedRAMP etc.).
Infrastructure as Code & Automation
- Design and maintain IaC templates (Terraform Pulumi Ansible CloudFormation) for reproducible ML infrastructure.
- Build and optimize CI/CD pipelines for AI/ML assets including data pipelines training workflows deployment artifacts and monitoring systems.
- Enforce best practices around automation reusability and observability of infrastructure and workflows.
Monitoring Logging & Observability
- Implement comprehensive observability for AI/ML workloads using Prometheus Grafana Stackdriver or Datadog.
- Monitor both infrastructure health (CPU memory cost) and ML-specific metrics (model drift data integrity anomaly detection).
- Define KPIs and usage metrics to measure platform performance adoption and operational health.
Qualifications
Education
- Bachelors or Masters degree in Computer Science Engineering or a related technical field.
Experience
- 5 years in MLOps DevOps Platform Engineering or Infrastructure Engineering.
- 2 years applying DevSecOps practices (secure CI/CD vulnerability management policy enforcement).
- Hands-on experience configuring and managing enterprise AI/ML platforms (IBM watsonx Google Vertex AI).
- Demonstrated success in building and scaling ML infrastructure automation pipelines and platform support models.
Technical Skills
- Proficiency with IaC tools (Terraform Pulumi Ansible CloudFormation).
- Strong scripting skills in Python and Bash.
- Deep understanding of containerization and orchestration (Docker Kubernetes).
- Experience with model lifecycle tools (MLflow TFX Vertex Pipelines or equivalents).
- Familiarity with secrets management policy-as-code access control and monitoring tools.
- Working knowledge of data engineering concepts and their integration into ML pipelines.
Preferred
- Cloud certifications (e.g. GCP Professional ML Engineer AWS DevOps Engineer IBM Cloud AI Engineer).
- Experience supporting platforms in regulated industries (HIPAA FedRAMP SOX PCI-DSS).
- Contributions to open-source projects in MLOps automation or DevSecOps.
- Familiarity with responsible AI practices including governance fairness interpretability and explainability.
- Hands-on experience with enterprise feature stores model monitoring frameworks and fairness toolkits.
TekWissen Group is an equal opportunity employer supporting workforce diversity
View more
View less