Senior Consultant AI DevOps Engineer

Chennai - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Title: Senior Consultant - AI DevOps Engineer AI Platforms

Career Level: D2

Introduction to role

Are you ready to redefine the future of technology in healthcare As a Senior Consultant - AI DevOps Engineer at AstraZeneca youll be at the forefront of embedding AI across our value chain from discovering novel compounds to enhancing patient safety and optimizing commercial outcomes. Our Enterprise AI organization is committed to delivering Connected Intelligence Reusable AI and AI this role youll go beyond traditional support to build operate and continuously improve production-grade AI platforms and pipelines. Youll be the first responder for incidents and service requests designing and automating infrastructure deployment workflows observability and governance guardrails. Collaborate with multidisciplinary engineers data scientists ML engineers and platform engineers to advance healthcare for millions of patients. Are you ready to make a difference

Accountabilities

Platform Engineering & Reliability: Build and operate AWS-based AI platform services (including Kubernetes GPU workloads storage networking service mesh). Own SLA/SLOs capacity planning cost optimization and performance tuning for AI workloads.
MLOps & CI/CD: Design and implement end-to-end CI/CD for ML systems (data models services) including feature stores model registries artifact/version management and automated model deployment to batch streaming and real-time endpoints.
Automation & Infrastructure as Code: Create reproducible environments with Infrastructure as Code (e.g. Terraform CloudFormation CDK). Automate environment provisioning cluster upgrades dependency management and blue/green or canary deployments.
Observability & Incident Response: Implement logging metrics tracing model/data drift monitoring and alerting. Lead L1L3 incident response root cause analysis and postmortems. Continuously improve runbooks and self-healing mechanisms.
Security Compliance & Governance: Partner with Cyber Security Data Privacy and internal governance to implement guardrails (identity secrets encryption vulnerability management).
Scalability & Performance: Optimize distributed training/inference across GPU multicore SMP and distributed clusters. Guide ML engineers on parallelization resource quotas and cost/performance trade-offs.
Operational Excellence: Champion a production-first attitude and streamline pathways for exploratory research to production through golden paths templates and platform enablement.
Collaboration & Enablement: Work closely with Connected Intelligence Reusable AI and AI Platforms teams. Provide training documentation and developer experience improvements.

Essential Skills/Experience

Education: B. Tech/M. Tech in Computer Science Engineering or a related quantitative field.
Cloud Expertise: 5-7 years of hands-on experience with AWS (or equivalent cloud) including core services (compute storage networking) IAM and cost management.
AI Platform experience: Experience with provisioning and managing enterprise AI platforms (Databricks Domino etc) at scale is a plus.
Kubernetes at Scale: 5-7 years working with Kubernetes and containerized applications; 5 years administering production clusters with understanding of operators storage classes GPU scheduling and autoscaling.
Programming: 3 years building and delivering software in Python; strong skills in another language (e.g. Go Java) are valued. Ability to write robust testable and observable services.
Infrastructure as Code & Automation: 3 years implementing Terraform/CloudFormation/CDK GitOps workflows (e.g. Argo CD Flux) and CI/CD systems (e.g. GitHub Actions GitLab CI Jenkins).
MLOps Tooling: Experience with ML orchestration and model lifecycle tools (e.g. MLflow Kubeflow). Familiarity with feature stores model registries A/B testing and shadow deployments for ML.
Observability: Proficiency with Prometheus/Grafana ELK/OpenSearch and incident management.
Security & Compliance: Experience implementing security controls (secrets management KMS encryption RBAC) and aligning to internal security standards; GxP experience is a plus.
Agile & ITIL: Comfortable working in Agile teams; experience in support environments or ITIL is beneficial with a strong focus on automation over manual operations.
DevOps Perspective: Demonstrated use of DevOps practices to enable automation strategies improve developer experience and reduce time-to-production.
Soft Skills: Creative collaborative resilient with excellent communication and the ability to translate complex technical topics for diverse stakeholders.

Desirable Skills/Experience

Data & Streaming: Experience with Spark Databricks Kafka/Kinesis and scalable data pipelines.
GenAI/LLM Ops: Familiarity with LLM serving prompt/response safety retrieval-augmented generation (RAG) vector databases and token-aware scaling.
Cost Optimization: Rightsizing spot/fleet strategies and chargeback/showback practices.

When we put unexpected teams in the same room we ignite bold thinking with the power to inspire life-changing -person working gives us the platform we need to connect work at pace and challenge perceptions. Thats why we work on average a minimum of three days per week from the office. But that doesnt mean were not flexible. We balance the expectation of being in the office while respecting individual flexibility. Join us in our unique and ambitious world.

At AstraZeneca your work directly impacts patients by transforming our ability to develop life-changing medicines. We empower the business to perform at its peak by combining pioneering science with leading digital technology platforms. Join us at a crucial stage of our journey as we become a digital and data-led enterprise. With a passion for impacting lives through data analytics and AI technologies like machine learningtheres no better time to join us!

Ready to make an impact Apply now to be part of our innovative team!

Date Posted

11-Nov-2025

Closing Date

AstraZeneca embraces diversity and equality of opportunity. We are committed to building an inclusive and diverse team representing all backgrounds with as wide a range of perspectives as possible and harnessing industry-leading skills. We believe that the more inclusive we are the better our work will be. We welcome and consider applications to join our team from all qualified candidates regardless of their characteristics. We comply with all applicable laws and regulations on non-discrimination in employment (and recruitment) as well as work authorization and employment eligibility verification requirements.

Required Experience:

Senior IC

Job Title: Senior Consultant - AI DevOps Engineer AI PlatformsCareer Level: D2Introduction to roleAre you ready to redefine the future of technology in healthcare As a Senior Consultant - AI DevOps Engineer at AstraZeneca youll be at the forefront of embedding AI across our value chain from discove...

Job Title: Senior Consultant - AI DevOps Engineer AI Platforms

Career Level: D2

Introduction to role

Accountabilities

Platform Engineering & Reliability: Build and operate AWS-based AI platform services (including Kubernetes GPU workloads storage networking service mesh). Own SLA/SLOs capacity planning cost optimization and performance tuning for AI workloads.
MLOps & CI/CD: Design and implement end-to-end CI/CD for ML systems (data models services) including feature stores model registries artifact/version management and automated model deployment to batch streaming and real-time endpoints.
Automation & Infrastructure as Code: Create reproducible environments with Infrastructure as Code (e.g. Terraform CloudFormation CDK). Automate environment provisioning cluster upgrades dependency management and blue/green or canary deployments.
Observability & Incident Response: Implement logging metrics tracing model/data drift monitoring and alerting. Lead L1L3 incident response root cause analysis and postmortems. Continuously improve runbooks and self-healing mechanisms.
Security Compliance & Governance: Partner with Cyber Security Data Privacy and internal governance to implement guardrails (identity secrets encryption vulnerability management).
Scalability & Performance: Optimize distributed training/inference across GPU multicore SMP and distributed clusters. Guide ML engineers on parallelization resource quotas and cost/performance trade-offs.
Operational Excellence: Champion a production-first attitude and streamline pathways for exploratory research to production through golden paths templates and platform enablement.
Collaboration & Enablement: Work closely with Connected Intelligence Reusable AI and AI Platforms teams. Provide training documentation and developer experience improvements.

Essential Skills/Experience

Education: B. Tech/M. Tech in Computer Science Engineering or a related quantitative field.
Cloud Expertise: 5-7 years of hands-on experience with AWS (or equivalent cloud) including core services (compute storage networking) IAM and cost management.
AI Platform experience: Experience with provisioning and managing enterprise AI platforms (Databricks Domino etc) at scale is a plus.
Kubernetes at Scale: 5-7 years working with Kubernetes and containerized applications; 5 years administering production clusters with understanding of operators storage classes GPU scheduling and autoscaling.
Programming: 3 years building and delivering software in Python; strong skills in another language (e.g. Go Java) are valued. Ability to write robust testable and observable services.
Infrastructure as Code & Automation: 3 years implementing Terraform/CloudFormation/CDK GitOps workflows (e.g. Argo CD Flux) and CI/CD systems (e.g. GitHub Actions GitLab CI Jenkins).
MLOps Tooling: Experience with ML orchestration and model lifecycle tools (e.g. MLflow Kubeflow). Familiarity with feature stores model registries A/B testing and shadow deployments for ML.
Observability: Proficiency with Prometheus/Grafana ELK/OpenSearch and incident management.
Security & Compliance: Experience implementing security controls (secrets management KMS encryption RBAC) and aligning to internal security standards; GxP experience is a plus.
Agile & ITIL: Comfortable working in Agile teams; experience in support environments or ITIL is beneficial with a strong focus on automation over manual operations.
DevOps Perspective: Demonstrated use of DevOps practices to enable automation strategies improve developer experience and reduce time-to-production.
Soft Skills: Creative collaborative resilient with excellent communication and the ability to translate complex technical topics for diverse stakeholders.

Desirable Skills/Experience

Data & Streaming: Experience with Spark Databricks Kafka/Kinesis and scalable data pipelines.
GenAI/LLM Ops: Familiarity with LLM serving prompt/response safety retrieval-augmented generation (RAG) vector databases and token-aware scaling.
Cost Optimization: Rightsizing spot/fleet strategies and chargeback/showback practices.

Ready to make an impact Apply now to be part of our innovative team!

Date Posted

11-Nov-2025

Closing Date

Required Experience:

Senior IC

Key Skills

Eclipse
Disaster Recovery
.NET
High Availability
Redshift
Data Management
IP Networking
Neo4j
Data Warehouse
Pre-sales
Oracle
DynamoDB

Apply Now

About Company

AstraZeneca

AstraZeneca is an equal opportunity employer. AstraZeneca will consider all qualified applicants for employment without discrimination on grounds of disability, sex or sexual orientation, pregnancy or maternity leave status, race or national or ethnic origin, age, religion or belief, ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click