Job description
DevOps & ML Ops Engineer would be responsible for developing and maintaining scalable stable services that deliver machine learning models to end users with guaranteed uptime. The primary focus will be on the infrastructure deployment and continuous integration/continuous delivery (CI/CD) processes for our ML services.
RESPONSIBILITIES:
Manage resource allocation and workload scheduling for multiple ML services ensuring efficient utilization of CPU/GPU resources and creating reliable queues based on service priorities.
Maintain VM environments and manage OS updates keep up-to-date VM inventory
Work alongside the Dev and QA team to detect hot spots in our applications and set preventative measure before it becomes a live issue.
Troubleshooting and provide solutions for system configurations
Plan execute and test disaster recovery
Monitor and examine all application performance event and system logs to assist in troubleshooting
Responsible for filing all IT/Colocation tickets ensuring fulfilment of requests escalating to the right person if necessary.
Design develop and maintain the infrastructure required for deploying and scaling machine learning services.
Implement and manage the CI/CD pipelines to ensure seamless and efficient deployment of ML models.
Collaborate with data scientists ML researchers and language experts to understand the requirements for deploying ML models and provide necessary infrastructure support.
Automate and streamline the build test and deployment processes to enhance efficiency and reduce time-to-market.
Monitor and optimize the performance availability and scalability of production ML systems.
Develop and maintain robust monitoring logging and alerting systems to proactively identify and address issues.
Implement security best practices to protect sensitive data and ensure compliance with relevant regulations.
Stay up-to-date with industry trends and emerging technologies related to ML Ops and DevOps and propose innovative solutions to improve our ML service delivery.
Job requirements
REQUIRED SKILLS EXPERIENCE AND QUALIFICATIONS:
Strong knowledge of cloud platforms (such as AWS Azure or GCP) and local cluster deployments and experience in deploying and managing ML services on these platforms.
Knowledge of distributed computing frameworks (e.g. Spark) and big data technologies (e.g. Hadoop Kafka).
Proficiency in Python Shell Ruby Golang or C and experience with infrastructure-as-code tools (e.g. Terraform CloudFormation).
Hands-on experience with containerization technologies (e.g. Docker) and orchestration frameworks (e.g. Kubernetes).
Familiarity with CI/CD tools (e.g. Jenkins GitLab CI/CD) and version control systems (e.g. Git).
Solid understanding of networking security and system administration concepts.
Strong problem-solving and troubleshooting skills with the ability to quickly analyze and resolve issues in complex ML systems.
Excellent communication and collaboration skills with the ability to work effectively in a team-oriented environment.
Bachelors or higher degree in Computer Science Engineering or a related field.
Proven experience as an ML Ops Engineer DevOps Engineer or a similar role with a focus on deploying and maintaining machine learning models in production environments.
DESIRED SKILLS AND EXPERIENCE:
Experience with machine learning frameworks and libraries such as TensorFlow PyTorch or scikit-learn.
Familiarity with serverless computing and event-driven architectures.
Experience with logging and monitoring tools (e.g. ELK Stack Prometheus Grafana).
Understanding of software development methodologies and agile practices
Hybrid
- Madrid Comunidad de Madrid Spain
- Malaga Andalucía Spain
- Palma de Mallorca Illes Balears Islas Baleares Spain
- Barcelona Catalunya Cataluña Spain
3 more
Tech
Full-time Permanent
All done!
Your application has been successfully submitted!
Required Experience:
IC
Job descriptionDevOps & ML Ops Engineer would be responsible for developing and maintaining scalable stable services that deliver machine learning models to end users with guaranteed uptime. The primary focus will be on the infrastructure deployment and continuous integration/continuous delivery (CI...
Job description
DevOps & ML Ops Engineer would be responsible for developing and maintaining scalable stable services that deliver machine learning models to end users with guaranteed uptime. The primary focus will be on the infrastructure deployment and continuous integration/continuous delivery (CI/CD) processes for our ML services.
RESPONSIBILITIES:
Manage resource allocation and workload scheduling for multiple ML services ensuring efficient utilization of CPU/GPU resources and creating reliable queues based on service priorities.
Maintain VM environments and manage OS updates keep up-to-date VM inventory
Work alongside the Dev and QA team to detect hot spots in our applications and set preventative measure before it becomes a live issue.
Troubleshooting and provide solutions for system configurations
Plan execute and test disaster recovery
Monitor and examine all application performance event and system logs to assist in troubleshooting
Responsible for filing all IT/Colocation tickets ensuring fulfilment of requests escalating to the right person if necessary.
Design develop and maintain the infrastructure required for deploying and scaling machine learning services.
Implement and manage the CI/CD pipelines to ensure seamless and efficient deployment of ML models.
Collaborate with data scientists ML researchers and language experts to understand the requirements for deploying ML models and provide necessary infrastructure support.
Automate and streamline the build test and deployment processes to enhance efficiency and reduce time-to-market.
Monitor and optimize the performance availability and scalability of production ML systems.
Develop and maintain robust monitoring logging and alerting systems to proactively identify and address issues.
Implement security best practices to protect sensitive data and ensure compliance with relevant regulations.
Stay up-to-date with industry trends and emerging technologies related to ML Ops and DevOps and propose innovative solutions to improve our ML service delivery.
Job requirements
REQUIRED SKILLS EXPERIENCE AND QUALIFICATIONS:
Strong knowledge of cloud platforms (such as AWS Azure or GCP) and local cluster deployments and experience in deploying and managing ML services on these platforms.
Knowledge of distributed computing frameworks (e.g. Spark) and big data technologies (e.g. Hadoop Kafka).
Proficiency in Python Shell Ruby Golang or C and experience with infrastructure-as-code tools (e.g. Terraform CloudFormation).
Hands-on experience with containerization technologies (e.g. Docker) and orchestration frameworks (e.g. Kubernetes).
Familiarity with CI/CD tools (e.g. Jenkins GitLab CI/CD) and version control systems (e.g. Git).
Solid understanding of networking security and system administration concepts.
Strong problem-solving and troubleshooting skills with the ability to quickly analyze and resolve issues in complex ML systems.
Excellent communication and collaboration skills with the ability to work effectively in a team-oriented environment.
Bachelors or higher degree in Computer Science Engineering or a related field.
Proven experience as an ML Ops Engineer DevOps Engineer or a similar role with a focus on deploying and maintaining machine learning models in production environments.
DESIRED SKILLS AND EXPERIENCE:
Experience with machine learning frameworks and libraries such as TensorFlow PyTorch or scikit-learn.
Familiarity with serverless computing and event-driven architectures.
Experience with logging and monitoring tools (e.g. ELK Stack Prometheus Grafana).
Understanding of software development methodologies and agile practices
Hybrid
- Madrid Comunidad de Madrid Spain
- Malaga Andalucía Spain
- Palma de Mallorca Illes Balears Islas Baleares Spain
- Barcelona Catalunya Cataluña Spain
3 more
Tech
Full-time Permanent
All done!
Your application has been successfully submitted!
Required Experience:
IC
View more
View less