Senior Site Reliability Engineer - AI/ML

Visa

Posted on : 09-07-2025

Employer Active

1 Vacancy

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Send me jobs like this

Job Alert

You will be updated with latest job alerts via email

Valid email field required

Send jobs

Job Location

Bangalore - India

Monthly Salary

Not Disclosed

Salary Not Disclosed

Vacancy

1 Vacancy

Posted on : 09-07-2025

Job Description

Position Summary:

The Reliability Engineering Automation team prides itself in keeping Visa systems up and secure catering to the 24*7 needs of the business. The GenAI Senior site reliability Engineer a highly motivated senior individual contributor based in India Bengaluru location responsible for availability latency performance efficiency change management monitoring emergency response and capacity planning. The role is a senior technologist who has the passion to solve problems developing systems and software that help increase site reliability and performance. Site reliability engineering (SRE) fuses the software engineering and operations disciplines in GenAI ecosystem.

Responsibilities:

- System Reliability: Ensure the uptime reliability and scalability of GenAI platforms and services.

- Monitoring & Alerting: Design implement and improve monitoring logging and alerting for AI workloads and infrastructure.

- Incident Response: Respond to investigate and resolve production incidents ensuring minimal disruption to GenAI services.

- Automation: Develop and maintain automation scripts for deployment scaling and recovery of GenAI systems.

- Performance Optimization: Analyze system bottlenecks and optimize resource utilization for AI model training and inference.

- Collaboration: Work closely with ML engineers data scientists DevOps and platform teams to support end-to-end GenAI pipelines.

- Security & Compliance: Implement robust security practices and ensure compliance with relevant data and AI regulations.

- Documentation: Maintain clear documentation for processes runbooks and system architecture.

Required Skills:

- Kubernetes & Containers: Proficiency in Kubernetes Docker and related tools for orchestration of AI workloads.

- Infrastructure as Code: Skills in Terraform Ansible or similar.

- Monitoring & Logging: Familiarity with Prometheus Grafana ELK stack or similar tools.

- Scripting & Programming: Ability to write scripts (Python Bash Go etc.) for automation and tooling.

- CI/CD Pipelines: Knowledge of CI/CD workflows especially for ML/AI projects.

- AI/ML Workloads: Understanding of ML model lifecycle distributed training and inference serving (e.g. using Ray Kubeflow MLFlow).

- Troubleshooting: Strong analytical and troubleshooting skills especially in complex distributed environments.

This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.

Qualifications :

Bachelors or masters in computer science Engineering or a related field.
- Professional Experience: 4 years as an SRE DevOps Engineer or similar preferably supporting AI/ML or large-scale data platforms.
- AI/ML Infrastructure: Hands-on experience operating GPU clusters AI frameworks (TensorFlow PyTorch) and data pipelines is a plus.
- Incident Management: Demonstrated experience in high-severity incident response and postmortem analysis.
- Collaboration: Experience working in cross-functional teams especially with AI/ML practitioners.

Additional Information :

Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race color religion sex national origin sexual orientation gender identity disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Remote Work :

Employment Type :

Full-time

Employment Type

Full-time

Company Industry

Key Skills

Apply Now

About Company

Visa

Report This Job

Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.

Start Now

Dr.Job AutoApply

3X your job search with AutoApply's AI for faster dream job results.

Senior Site Reliability Engineer - AI/ML

Visa

Job Description

Employment Type

Company Industry

Key Skills

About Company

Similar Jobs

Senior Generative AI Engineer

AI Engineer

Senior Network Engineer

Accounting AI Analyst

ZZP Senior Test Engineer

Senior Machine Learning Engineer

Senior Controls Engineer - Albuquerque

Senior Computer Vision Engineer