ML Ops Engineer

Circadia Health

Not Interested
Bookmark
Report This Job

profile Job Location:

London - UK

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

Position Overview

As an ML Ops Engineer at Circadia Health you will own the infrastructure and operational lifecycle of the machine learning systems that power our clinical monitoring platform. You will build and maintain the production ML pipelines deployment infrastructure and monitoring systems that enable Circadias predictive models to identify early signs of clinical deterioration.

Reporting to the Principal ML Engineer you will work across ML backend data and clinical teams to ensure models are reliably trained versioned deployed and monitored in both cloud and edge environments. You will be a key driver in elevating Circadias ML practice from reproducibility and experiment tracking to CI/CD for models and operational observability.

This is a high-ownership role at a lean company where production reliability rapid iteration and pragmatic engineering are essential. Your work will directly impact patient outcomes by ensuring our predictive models are always running always accurate and always improving.

Key Responsibilities

    • Own and extend Circadias ML pipeline orchestration using Apache Airflow including training evaluation and deployment workflows.
    • Build and maintain automated pipelines for model retraining validation and promotion across development staging and production environments.
    • Implement pipeline monitoring alerting and failure recovery to eliminate silent failures and ensure operational reliability.
    • Design pipeline architectures that support rapid experimentation while enforcing production-grade reproducibility.
    • Deploy and manage ML models on AWS infrastructure (e.g. AWS Batch for batch inference workloads).
    • Support deployment of models to edge devices including Circadias clinical monitoring hardware working with firmware and embedded engineering teams as needed.
    • Manage model versioning promotion and rollback workflows through the MLflow model registry.
    • Evaluate and implement strategies for safe model rollouts (e.g. shadow deployments canary releases) as the platform matures.
    • Maintain and improve the MLflow-based experiment tracking and model registry infrastructure.
    • Establish conventions for experiment logging artifact storage model metadata and lineage tracking.
    • Enable ML engineers to move seamlessly from experimentation to production deployment with minimal friction.
    • Implement and maintain training data versioning and dataset management practices to ensure reproducibility of model training runs.
    • Track dataset lineage labeling provenance and feature dependencies alongside model versions.
    • Collaborate with ML engineers and data engineers to formalise dataset release and validation workflows.
    • Build monitoring systems for model performance in production including data drift detection prediction quality tracking and alerting on degradation.
    • Implement operational dashboards for pipeline health compute utilisation and deployment status.
    • Collaborate with data engineering to ensure upstream data quality and pipeline reliability for ML feature inputs.
    • Develop incident response procedures and runbooks for ML system failures.
    • Manage and optimise AWS compute resources (Batch EC2 or similar) used for model training and inference.
    • Design infrastructure-as-code solutions for reproducible ML environments.
    • Drive cost optimisation across ML compute storage and data transfer.
    • Support Snowflake integrations for feature generation and training data pipelines.
    • Introduce and champion ML engineering best practices including CI/CD for models automated testing for ML pipelines and reproducible training workflows.
    • Build internal tooling and templates that accelerate the ML development-to-production cycle.
    • Document operational processes architecture decisions and onboarding materials for the ML platform.
    • Participate in architecture discussions and technical planning to ensure ML systems scale with Circadias growth.
    • Ensure all ML pipelines and infrastructure meet healthcare security and privacy requirements including HIPAA and SOC 2.
    • Apply best practices for handling Protected Health Information (PHI) in training data model artifacts and inference outputs.
    • Maintain audit trails for model decisions data access and deployment history.

Required Qualifications

    • 4 years of experience in MLOps ML Engineering DevOps or a closely related infrastructure role.
    • Strong proficiency in Python for ML pipeline development tooling and automation.
    • Hands-on experience with ML pipeline orchestration tools particularly Apache Airflow.
    • Experience with model registries and experiment tracking platforms (MLflow preferred).
    • Experience deploying and operating ML workloads on AWS (Batch EC2 S3 IAM CloudWatch).
    • Solid understanding of the ML lifecycle: training evaluation deployment monitoring and retraining.
    • Experience with containerisation (Docker) and infrastructure-as-code.
    • Proficiency with Git and version control workflows.
    • Familiarity with SQL and data warehousing platforms (Snowflake preferred).
    • Experience implementing monitoring logging and alerting for production systems.
    • Strong debugging and incident response skills for complex distributed systems.

Preferred Qualifications

    • Experience deploying models to edge or embedded devices.
    • Background in healthcare medical devices or clinical data systems.
    • Familiarity with model serving frameworks (e.g. TorchServe TF Serving Triton or custom solutions).
    • Experience with CI/CD systems for ML (e.g. GitHub Actions Jenkins or similar).
    • Experience with data versioning tools (e.g. DVC LakeFS or similar).
    • Experience supporting data science or ML research teams in a production context.
    • Exposure to HIPAA compliance and healthcare security best practices.
    • Experience with distributed compute frameworks (e.g. Apache Spark Dask) for large-scale data processing.
    • Experience with streaming or real-time inference architectures.

What You Bring

    • You take ownership of ML infrastructure end-to-end from training pipelines to production monitoring.
    • You care deeply about reliability reproducibility and operational excellence in ML systems.
    • You have strong opinions (loosely held) on how to build a great ML platform and youre eager to put them into practice.
    • You are comfortable working in a startup environment where youll wear multiple hats and move fast.
    • You communicate clearly across engineering data science and clinical teams.
    • Youre motivated by building technology that directly improves patient care.
Why Circadia Health

Circadia Health is redefining patient monitoring through contactless sensing and AI-driven clinical insights. As we scale from tens of thousands to hundreds of thousands of monitored patients our data infrastructure is central to everything we do.

Youll have the opportunity to:
- Work on real-world healthcare problems with measurable patient impact
- Build data systems that power clinical-grade AI and ML
- Take ownership in a fast-growing mission-driven company
- Collaborate with a highly skilled multidisciplinary team
We may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.

Required Experience:

IC

Position OverviewAs an ML Ops Engineer at Circadia Health you will own the infrastructure and operational lifecycle of the machine learning systems that power our clinical monitoring platform. You will build and maintain the production ML pipelines deployment infrastructure and monitoring systems th...
View more view more

Key Skills

  • ASP.NET
  • Health Education
  • Fashion Designing
  • Fiber
  • Investigation

About Company

Enabling early detection of respiratory failure powered by contactless sensing, clinic software, and artificial intelligence.

View Profile View Profile