We are seeking an advanced ML Ops Engineer to design and implement the infrastructure required to host orchestrate and manage up to 1500 ML scoring processes within a new Databricks environment. The focus of the role is on operationalizing the ML scoring pipelines by setting up a scalable secure and wellmonitored platform for data science teams to deploy their models.
Qualifications :
Environment Configuration
- Set up Databricks clusters jobs and workflows for large-scale ML scoring use cases.
- Infrastructure as Code is used for reproducibility and governance (e.g. Terraform).
- Implement scalable infrastructure capable of running thousands of ML scoring tasks.
- Configure job scheduling parallel execution strategies and resource optimization.
- Monitoring and alerting are integrated into the platform using cloud-native tools.
- Security compliance and cost-efficiency are key pillars of the operational setup.
ML Ops Pipeline Integration
- Develop deployment processes for ML models using Databricks MLflow or equivalent.
- Implement version control and tracking for models scoring code and configuration files.
Execution Management
- Build frameworks to orchestrate scoring of >1500 ML models or scoring jobs.
- Ensure resilience fault tolerance and restart capabilities for failed jobs.
- Monitoring & Observability Integrate logging alerting and dashboards to monitor scoring throughput latency and failures.
- Establish model performance monitoring hooks for postscoring analytics.
Automation
- Work alongside Dev Ops Engineers to ensure common infrastructure and processes (e.g. shared storage Delta Lake tables) serve both ML and BI use cases.
- Automate provisioning of resources and deployments from CI/CD pipelines.
- Utilize Infrastructure as Code (IaC) where feasible for reproducibility.
Collaboration
- Work closely with data scientists solution architects and platform engineers to ensure smooth handover from model development to operational scoring.
- Define operational SLAs for scoring workloads.
Additional Information :
Work 3 times a week from an office in Warsaw Lublin or Poznań.
Remote Work :
No
Employment Type :
Full-time
We are seeking an advanced ML Ops Engineer to design and implement the infrastructure required to host orchestrate and manage up to 1500 ML scoring processes within a new Databricks environment. The focus of the role is on operationalizing the ML scoring pipelines by setting up a scalable secure and...
We are seeking an advanced ML Ops Engineer to design and implement the infrastructure required to host orchestrate and manage up to 1500 ML scoring processes within a new Databricks environment. The focus of the role is on operationalizing the ML scoring pipelines by setting up a scalable secure and wellmonitored platform for data science teams to deploy their models.
Qualifications :
Environment Configuration
- Set up Databricks clusters jobs and workflows for large-scale ML scoring use cases.
- Infrastructure as Code is used for reproducibility and governance (e.g. Terraform).
- Implement scalable infrastructure capable of running thousands of ML scoring tasks.
- Configure job scheduling parallel execution strategies and resource optimization.
- Monitoring and alerting are integrated into the platform using cloud-native tools.
- Security compliance and cost-efficiency are key pillars of the operational setup.
ML Ops Pipeline Integration
- Develop deployment processes for ML models using Databricks MLflow or equivalent.
- Implement version control and tracking for models scoring code and configuration files.
Execution Management
- Build frameworks to orchestrate scoring of >1500 ML models or scoring jobs.
- Ensure resilience fault tolerance and restart capabilities for failed jobs.
- Monitoring & Observability Integrate logging alerting and dashboards to monitor scoring throughput latency and failures.
- Establish model performance monitoring hooks for postscoring analytics.
Automation
- Work alongside Dev Ops Engineers to ensure common infrastructure and processes (e.g. shared storage Delta Lake tables) serve both ML and BI use cases.
- Automate provisioning of resources and deployments from CI/CD pipelines.
- Utilize Infrastructure as Code (IaC) where feasible for reproducibility.
Collaboration
- Work closely with data scientists solution architects and platform engineers to ensure smooth handover from model development to operational scoring.
- Define operational SLAs for scoring workloads.
Additional Information :
Work 3 times a week from an office in Warsaw Lublin or Poznań.
Remote Work :
No
Employment Type :
Full-time
View more
View less