This role requires a blend of skills in software engineering machine learning and operations to ensure the smooth functioning of ML systems in production this role you will:- Lead the team to design and implement automation for model training testing validation and deployment- Collaborate with machine learning engineers to ensure efficient deployment and scaling of ML models- Implement monitoring and alerting systems to track model performance system health and data drift- Optimize compute resources for cost and performance efficiency- Manage model versions to ensure traceability and reproducibility
6 years of experience in the design and implement of Large-scale ML Systems or Distributed Systems
Experience with model pipeline and registry tools detecting and preventing model drift automating model monitoring and ensuring model accuracy
Proficiency in programming languages such as Python Java or Golang
Effective communication skills in written and spoken English
Bachelor or above in Software Engineering Computer Science Machine Learning or a related field
Experience in machine learning frameworks such as TensorFlow PyTorch AutoGluon XGBoost or Scikit-learn
Experienced in DevOps Tools such as Docker Jenkins Ansible Grafana Prometheus Elastic or Kubernetes
Familiar with CI/CD deployment practices
Experience with SQL and database systems such as PostgreSQL
Experience with building ETL pipeline in data warehouse such as Snowflake
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.