Job Title: ML Ops Lead/ ML Ops Engineer
Location: Dallas TX
Duration: 6 Months with possible extension.
Job description:
- Build & Automate ML Pipelines: Design implement and maintain CI/CD pipelines for machine learning models ensuring automated data ingestion model training testing versioning and deployment.
- Operationalize Models: Collaborate closely with data scientists to containerize optimize and deploy their models to production focusing on reproducibility scalability and performance.
- Infrastructure Management: Design and manage the underlying cloud infrastructure (AWS) that powers our MLOps platform leveraging Infrastructure-as-Code (IaC) tools to ensure consistency and cost optimization.
- Monitoring & Observability: Implement comprehensive monitoring alerting and logging solutions to track model performance data integrity and pipeline health in real-time. Proactively address issues like model or data drift.
- Governance & Security: Establish and enforce best practices for model and data versioning auditability security and access control across the entire machine learning lifecycle.
- Tooling & Frameworks: Develop and maintain reusable tools and frameworks to accelerate the ML development process and empower data science teams.
- Cloud Expertise: Extensive hands-on experience in designing and implementing MLOps solutions on AWS. Proficient with core services like SageMaker S3 ECS EKS Lambda SQS SNS and IAM.
- Coding & Automation: Strong coding proficiency in Python. Extensive experience with automation tools including Terraform for IaC and GitHub Actions.
- MLOps & DevOps: A solid understanding of MLOps and DevOps principles. Hands-on experience with MLOps frameworks like Sagemaker Pipelines Model Registry Weights and Bias MLflow or Kubeflow and orchestration tools like Airflow or Argo Workflows.
- Containerization: Expertise in developing and deploying containerized applications using Docker and orchestrating them with ECS and EKS.
- Model Lifecycle: Experience with model testing validation and performance monitoring. Good understanding of ML frameworks like PyTorch or TensorFlow is required to effectively collaborate with data scientists.
- Communication: Excellent communication and documentation skills with a proven ability to collaborate with cross-functional teams (data scientists data engineers and architects).
Keywords: ML Ops Saga maker AWS ECS EKS Lambda Python