MLOPS Engineer

Programmers.io


Job Location:

Sunnyvale, CA - USA

Monthly Salary: Not Disclosed
Posted on: 5 hours ago
Vacancies: 1 Vacancy

Job Summary

Key Responsibilities

Realtime ML Operations & Playground Migration (45%)

  • Act as primary on call and maintenance owner for the Realtime ML production stack during the Auriga migration window.
  • Monitor system health triage and resolve incidents and address data/model serving issues.
  • Apply security updates dependency patches and ensure SLA continuity for downstream consumers.
  • Lead migration of the Realtime ML Playground environment including infra parity checks configuration migration integration testing and documentation.

EKS Migration HarperCollins Bundles (40%)

  • Execute end to end migration of HarperCollins service bundles to AWS EKS.
  • Author Kubernetes manifests configure IAM and networking and update CI/CD pipelines.
  • Validate in staging and perform a controlled production cutover.
  • Produce rollback plans and operational runbooks.

Buildings Production Pipeline (Supporting 15%)

  • Contribute to design and initial build out of a pipeline streaming ML detected missing building into the Basemap data flow.
  • Deliver pipeline scaffolding integration patterns with upstream ML outputs and schema inputs.
  • Leave a documented partially implemented pipeline with clear handoff notes for post engagement completion.

Key Deliverables (by End of Engagement)

  • Stable Realtime ML production environment throughout Auriga migration with documented incidents and resolutions.
  • Fully migrated Realtime ML Playground with handoff documentation.
  • HarperCollins bundles live on EKS with completed cutover and operational runbooks.
  • Partially implemented Buildings pipeline with documentation enabling seamless handoff.
  • All code IaC and documentation checked into team repositories.
Key Responsibilities Realtime ML Operations & Playground Migration (45%) Act as primary on call and maintenance owner for the Realtime ML production stack during the Auriga migration window. Monitor system health triage and resolve incidents and address data/model serving issues. Apply se...