drjobs Senior Site Reliability Engineer - AI/ML

Senior Site Reliability Engineer - AI/ML

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bangalore - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Position Summary:

The Reliability Engineering Automation team prides itself in keeping Visa systems up and secure catering to the 24*7 needs of the business. The GenAI Senior site reliability Engineer a highly motivated senior individual contributor based in India Bengaluru location responsible for availability latency performance efficiency change management monitoring emergency response and capacity planning. The role is a senior technologist who has the passion to solve problems developing systems and software that help increase site reliability and performance. Site reliability engineering (SRE) fuses the software engineering and operations disciplines in GenAI ecosystem.

Responsibilities:

- System Reliability:  Ensure the uptime reliability and scalability of GenAI platforms and services.

- Monitoring & Alerting: Design implement and improve monitoring logging and alerting for AI workloads and infrastructure.

- Incident Response: Respond to investigate and resolve production incidents ensuring minimal disruption to GenAI services.

- Automation: Develop and maintain automation scripts for deployment scaling and recovery of GenAI systems.

- Performance Optimization: Analyze system bottlenecks and optimize resource utilization for AI model training and inference.

- Collaboration: Work closely with ML engineers data scientists DevOps and platform teams to support end-to-end GenAI pipelines.

- Security & Compliance: Implement robust security practices and ensure compliance with relevant data and AI regulations.

- Documentation: Maintain clear documentation for processes runbooks and system architecture.

Required Skills:

- Kubernetes & Containers: Proficiency in Kubernetes Docker and related tools for orchestration of AI workloads.

- Infrastructure as Code: Skills in Terraform Ansible or similar.

- Monitoring & Logging: Familiarity with Prometheus Grafana ELK stack or similar tools.

- Scripting & Programming: Ability to write scripts (Python Bash Go etc.) for automation and tooling.

- CI/CD Pipelines: Knowledge of CI/CD workflows especially for ML/AI projects.

- AI/ML Workloads: Understanding of ML model lifecycle distributed training and inference serving (e.g. using Ray Kubeflow MLFlow).

- Troubleshooting:   Strong analytical and troubleshooting skills especially in complex distributed environments.

 

This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.


Qualifications :

Bachelors or masters in computer science Engineering or a related field.
- Professional Experience: 4 years as an SRE DevOps Engineer or similar preferably supporting AI/ML or large-scale data platforms.
- AI/ML Infrastructure: Hands-on experience operating GPU clusters AI frameworks (TensorFlow PyTorch) and data pipelines is a plus.
- Incident Management: Demonstrated experience in high-severity incident response and postmortem analysis.
- Collaboration: Experience working in cross-functional teams especially with AI/ML practitioners.


Additional Information :

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.


Remote Work :

No


Employment Type :

Full-time

Employment Type

Full-time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.