Site Reliability Engineer

528

Posted on : 12-04-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Chennai - India

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 12-04-2025

Job Description

Job Summary

We are seeking an experienced Site Reliability Engineer with AI MLOps to support the development and optimization of our ERP product primarily in Azure and Windows environments. This role combines MLOps expertise with Site Reliability Engineering (SRE) principles to ensure the reliable scalable and costefficient deployment of AI models. The ideal candidate will focus on improving security compliance and operational efficiency collaborating with North American and global teams to meet business objectives.

Key Responsibilities

AI MLOps Pipeline: Build and optimize CI/CD pipelines to automate the training testing and deployment of AI models on Azure with a strong emphasis on improving efficiency and reducing costs.
Azure Infrastructure Management: Manage and maintain scalable secure infrastructure using Azure services like Azure Machine Learning AKS and Virtual Machines. Continuously optimize resource usage and implement costsaving measures.
Windows Server Management: Oversee Windowsbased servers hosted on Azure ensuring they meet performance security and compliance requirements while also identifying and executing costsaving opportunities.
Cost Optimization: Analyze and manage infrastructure costs by identifying unused or underused resources and implementing optimization strategies to drive cost savings.
Monitoring & Performance Optimization: Monitor the health performance and costs of AI models and services using Azure Monitor NewRelic and other tools. Identify performance bottlenecks and optimize for both operational efficiency and cost reduction.
Model Versioning & Governance: Assist in managing model version control governance and lifecycle processes with a focus on costeffective operations.
Crossfunctional Collaboration: Collaborate with data scientists AI engineers and software developers to support the efficient deployment and operationalization of AI models while actively seeking ways to minimize costs.
Incident Management & Automation: Participate in incident resolution and automate tasks to reduce manual work improve system reliability and lower operational overhead.
Security & Compliance Assurance: Ensure AI/ML workloads comply with security and regulatory standards implementing costefficient solutions to enhance security and data protection.

Qualifications

Experience: 2 5 years in MLOps SRE or similar roles focusing on Azure and Windows environments.
Cloud Skills: Proficient in Azure services managing infrastructure and Windows workloads.
SRE Knowledge: Familiar with Site Reliability Engineering principles like monitoring and automation.
DevOps: Handson experience with CI/CD tools like Azure DevOps.
Scripting: Skilled in PowerShell and Python for automation.
Containers: Knowledge of Docker and Kubernetes for deploying AI/ML applications.
Windows Admin: Strong experience managing Windows Servers and related services.
AI/ML Knowledge: Understanding of AI/ML workflows and model deployment.

NicetoHave

Experience with InfrastructureasCode tools like Terraform.
Azure certifications (e.g. Azure AI Engineer Azure DevOps Engineer)
Experience implementing costsaving strategies in cloud environments

Soft Skills

Strong problemsolving skills with the ability to troubleshoot complex issues.
Excellent communication skills and the ability to collaborate effectively with crossfunctional teams.
A passion for innovation and continuous improvement in AI/ML systems.

Employment Type

Full-Time

Company Industry

Key Skills

Apply Now

About Company

528

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Site Reliability Engineer

528

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

IT Engineer

Senior Engineer Material compliance (IMDS)

Purchase Engineer

Mechanical Engineer

Security Engineer

System Engineer

Frontend Engineer

Mechanical Engineer