ML OPS Engineer
Job Summary
In this role you will be part of the analytics team playing a critical role in the industrialization of data science to drive significant business impact. As an MLOps Engineer you will lead the large-scale deployment and maintenance of complex machine learning pipelines. You will build and manage the underlying infrastructure that allows AI models to move seamlessly from research to production ensuring that these systems are not only high-performing but also safe observable and robust. You will be responsible for architecting operational workflows that bridge the gap between data science development and enterprise-grade IT operations ensuring that every model is backed by rigorous automation and monitoring.
Your responsibilities will include designing and implementing automated CI/CD pipelines using Tekton to orchestrate the end-to-end lifecycle of ML models. You will be a champion of operational excellence implementing advanced observability and traceability frameworks to monitor model health and system performance in real-time. You will utilize Dynatrace and centralized logging solutions to build comprehensive monitoring pipelines that track latency resource utilization and system errors. Additionally you will be responsible for maintaining the stability of large-scale deployments ensuring that all AI assets integrate seamlessly with the internal ecosystem via standardized protocols. Development and deployment will occur exclusively within the GCP ecosystem utilizing Vertex AI Cloud Run GKE and BigQuery.
Responsibilities
- Architect and manage the end-to-end deployment of machine learning models across production environments ensuring scalability and high availability.
- Design build and maintain automated CI/CD pipelines using Tekton to streamline model development testing and release cycles.
- Implement and manage comprehensive observability and traceability frameworks to monitor model health data drift and system performance in real-time.
- Configure advanced monitoring solutions using Dynatrace and centralized logging systems to track latency resource utilization and system errors.
- Develop and maintain MLOps infrastructure exclusively within the GCP ecosystem utilizing Vertex AI Google Kubernetes Engine (GKE) and BigQuery.
- Automate model retraining validation and deployment workflows to ensure models remain accurate and performant in production.
- Partner with data scientists and software engineers to transition models from research/prototypes to robust enterprise-grade production assets.
Qualifications
3 years of professional experience in MLOps DevOps or Software Engineering with a specific focus on the industrialization of machine learning models.
Bachelors or Masters degree in a quantitative field (e.g. Computer Science Engineering Statistics or Mathematics).
Proven track record of building and maintaining complex automated pipelines using Tekton or similar orchestration tools.
Demonstrated experience implementing enterprise-grade monitoring logging and distributed tracing in a professional environment.
Deep understanding of the GCP stack particularly services related to model hosting orchestration and data management.
Required Experience:
IC
About Company
FordĀ® is Built for America. Discover the latest lineup in new Ford vehicles! Explore hybrid & electric vehicle options, see photos, build & price, search inventory, view pricing & incentives & see the latest technology & news happening at Ford.