ROLE SUMMARY
The Sr. Manager/Staff Engineer AI Infrastructure & MLOps Engineering is a senior technical leader responsible for architecting building and scaling Pfizers AI infrastructure and developer platforms. This role leverages extensive experience in cloud engineering DevOps and MLOps to deliver robust high-performance solutions supporting advanced AI/ML workloads in biotechnology healthcare and enterprise technology. The successful candidate will drive innovation in automation reliability and scalability enabling scientists and engineers to rapidly develop deploy and monitor machine learning models in production environments.
ROLE RESPONSIBILITIES
Platform Architecture & Engineering
- Design implement and own large-scale cloud-based HPC and MLOps platforms supporting AI model training genomic sequencing and precision medicine.
- Architect multi-environment clusters (AWS GCP Azure) enabling GPU/FPGA workloads and advanced observability.
- Lead the development of developer and cloud platforms including internal engineering accelerators and reusable toolsets.
Platform Catalog & Developer Experience
- Design implement and manage unified platform catalogs using Backstage enhancing developer experience and application metadata management.
- Develop custom plugins and APIs for Backstage to support internal engineering workflows and documentation.
Automation & DevOps Excellence
- Build and maintain Python-based automation frameworks CI/CD pipelines and Infrastructure-as-Code (Terraform Helm Pulumi AWS CDK).
- Operationalize containerized solutions using Docker and Kubernetes integrating MLflow Kubeflow and other orchestration platforms.
- Implement robust automation for provisioning configuring and managing cloud resources across multiple environments.
MLOps & Reliability Engineering
- Lead the implementation of Service Level Indicators (SLIs) Service Level Objectives (SLOs) and advanced observability (Prometheus Grafana PagerDuty).
- Develop and maintain APIs and services for model management feature stores and inference pipelines.
- Operationalize ML model serving at scale using frameworks such as TensorFlow Serving TorchServe KServe and Seldon Core.
- Ensure compliance with industry standards (e.g. HIPAA FDA) for data protection and reliability.
Collaboration & Leadership
- Mentor engineers and lead cross-functional teams to deliver integrated solutions.
- Champion engineering excellence through design documentation code reviews and testing automation.
- Present at industry summits author technical proposals and contribute to open-source projects (Kubernetes Helm Go Envoy).
Continuous Improvement
- Drive agile delivery sprint planning and performance optimization.
- Lead incident response and disaster recovery initiatives for mission-critical platforms.
- Foster a culture of shared ownership transparency and innovation
BASIC QUALIFICATIONS
- 8 years of hands-on software engineering experience in cloud infrastructure DevOps and MLOps.
- Deep expertise in Python Kubernetes Terraform Helm and CI/CD pipeline development.
- Proven experience architecting and operating containerized solutions on AWS GCP and Azure.
- Strong knowledge of Infrastructure-as-Code distributed systems and production system reliability.
- Bachelors or Masters degree in Computer Science Engineering or related field.
PREFERRED QUALIFICATIONS
- Expertise in AWS cloud services (EC2 S3 Lambda EKS SageMaker API Gateway CloudFormation IAM etc.).
- Experience deploying and customizing Backstage as a unified catalog for teams services and technical documentation.
- Experience building and deploying microservices and REST/gRPC APIs for AI model delivery.
- Familiarity with MLflow Kubeflow and other MLOps orchestration platforms.
- Proficiency with model serving frameworks (TensorFlow Serving TorchServe KServe Seldon Core BentoML etc.).
Work Location Assignment:Tokyo HQ
Pfizer is an equal opportunity employer and complies with all applicable equal employment opportunity legislation in each jurisdiction in which it operates.
Information & Business Tech
Required Experience:
Senior Manager
ROLE SUMMARYThe Sr. Manager/Staff Engineer AI Infrastructure & MLOps Engineering is a senior technical leader responsible for architecting building and scaling Pfizers AI infrastructure and developer platforms. This role leverages extensive experience in cloud engineering DevOps and MLOps to deliver...
ROLE SUMMARY
The Sr. Manager/Staff Engineer AI Infrastructure & MLOps Engineering is a senior technical leader responsible for architecting building and scaling Pfizers AI infrastructure and developer platforms. This role leverages extensive experience in cloud engineering DevOps and MLOps to deliver robust high-performance solutions supporting advanced AI/ML workloads in biotechnology healthcare and enterprise technology. The successful candidate will drive innovation in automation reliability and scalability enabling scientists and engineers to rapidly develop deploy and monitor machine learning models in production environments.
ROLE RESPONSIBILITIES
Platform Architecture & Engineering
- Design implement and own large-scale cloud-based HPC and MLOps platforms supporting AI model training genomic sequencing and precision medicine.
- Architect multi-environment clusters (AWS GCP Azure) enabling GPU/FPGA workloads and advanced observability.
- Lead the development of developer and cloud platforms including internal engineering accelerators and reusable toolsets.
Platform Catalog & Developer Experience
- Design implement and manage unified platform catalogs using Backstage enhancing developer experience and application metadata management.
- Develop custom plugins and APIs for Backstage to support internal engineering workflows and documentation.
Automation & DevOps Excellence
- Build and maintain Python-based automation frameworks CI/CD pipelines and Infrastructure-as-Code (Terraform Helm Pulumi AWS CDK).
- Operationalize containerized solutions using Docker and Kubernetes integrating MLflow Kubeflow and other orchestration platforms.
- Implement robust automation for provisioning configuring and managing cloud resources across multiple environments.
MLOps & Reliability Engineering
- Lead the implementation of Service Level Indicators (SLIs) Service Level Objectives (SLOs) and advanced observability (Prometheus Grafana PagerDuty).
- Develop and maintain APIs and services for model management feature stores and inference pipelines.
- Operationalize ML model serving at scale using frameworks such as TensorFlow Serving TorchServe KServe and Seldon Core.
- Ensure compliance with industry standards (e.g. HIPAA FDA) for data protection and reliability.
Collaboration & Leadership
- Mentor engineers and lead cross-functional teams to deliver integrated solutions.
- Champion engineering excellence through design documentation code reviews and testing automation.
- Present at industry summits author technical proposals and contribute to open-source projects (Kubernetes Helm Go Envoy).
Continuous Improvement
- Drive agile delivery sprint planning and performance optimization.
- Lead incident response and disaster recovery initiatives for mission-critical platforms.
- Foster a culture of shared ownership transparency and innovation
BASIC QUALIFICATIONS
- 8 years of hands-on software engineering experience in cloud infrastructure DevOps and MLOps.
- Deep expertise in Python Kubernetes Terraform Helm and CI/CD pipeline development.
- Proven experience architecting and operating containerized solutions on AWS GCP and Azure.
- Strong knowledge of Infrastructure-as-Code distributed systems and production system reliability.
- Bachelors or Masters degree in Computer Science Engineering or related field.
PREFERRED QUALIFICATIONS
- Expertise in AWS cloud services (EC2 S3 Lambda EKS SageMaker API Gateway CloudFormation IAM etc.).
- Experience deploying and customizing Backstage as a unified catalog for teams services and technical documentation.
- Experience building and deploying microservices and REST/gRPC APIs for AI model delivery.
- Familiarity with MLflow Kubeflow and other MLOps orchestration platforms.
- Proficiency with model serving frameworks (TensorFlow Serving TorchServe KServe Seldon Core BentoML etc.).
Work Location Assignment:Tokyo HQ
Pfizer is an equal opportunity employer and complies with all applicable equal employment opportunity legislation in each jurisdiction in which it operates.
Information & Business Tech
Required Experience:
Senior Manager
View more
View less