ML Ops Site Reliability Engineer

RECEX

Posted on : 07-06-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Chennai - India

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 07-06-2025

Job Description

We are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team.

In this role you will be responsible for ensuring the reliability scalability and performance of our machine learning infrastructure. You will work closely with data scientists machine learning engineers and software developers to build and maintain robust and efficient systems that support our machine learning workflows.

This position offers an exciting opportunity to work on cutting-edge technologies and make a

significant impact on our organizations success.

Design implement and maintain scalable and reliable machine learning infrastructure.

Collaborate with data scientists and machine learning engineers to deploy and manage machine

learning models in production.

Develop and maintain CI/CD pipelines for machine learning workflows.

Monitor and optimize the performance of machine learning systems and infrastructure.

Implement and manage automated testing and validation processes for machine learning models.

Ensure the security and compliance of machine learning systems and data.

Troubleshoot and resolve issues related to machine learning infrastructure and workflows.

Document processes procedures and best practices for machine learning operations.

Stay up-to-date with the latest developments in MLOps and related technologies.

Qualifications:

Required:

Bachelors degree in Computer Science Engineering or a related field.

Proven experience as a Site Reliability Engineer (SRE) or in a similar role.

Strong knowledge of machine learning concepts and workflows.

Proficiency in programming languages such as Python Java or Go.

Experience with cloud platforms such as AWS Azure or Google Cloud.

Familiarity with containerization technologies like Docker and Kubernetes.

Experience with CI/CD tools such as Jenkins GitLab CI or CircleCI.

Strong problem-solving skills and the ability to troubleshoot complex issues.

Excellent communication and collaboration skills.

Preferred:

Masters degree in Computer Science Engineering or a related field.

Experience with machine learning frameworks such as TensorFlow PyTorch or Scikit-learn.

Knowledge of data engineering and data pipeline tools such as Apache Spark Apache Kafka or

Airflow.

Experience with monitoring and logging tools such as Prometheus Grafana or ELK stack.

Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.

Experience with automated testing frameworks for machine learning models.

Knowledge of security best practices for machine learning systems and data.

Employment Type

Full Time

Company Industry

Key Skills

Apply Now

About Company

RECEX

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

ML Ops Site Reliability Engineer

RECEX

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Senior Site Reliability Engineer

SITE RELIABILITY ENGINEER F/H

ML Engineer

Data Scientist / ML Engineer

Junior Site Engineer

DATA ENGINEER/OPS F/H