Associate Staff Engineer, Devops

Mumbai - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Department:

Engineering

Job Summary

Requirement:

Experience: 5 years
Strong experience in DevOps or Site Reliability Engineering (SRE) roles.
Strong knowledge of Docker Kubernetes Terraform and CI/CD pipelines.
Hands-on experience with AWS Azure or other cloud platforms.
Familiarity with GPU infrastructure and ML workloads is a plus.
Good understanding of monitoring and logging systems (Prometheus Grafana).
Ability to collaborate with ML teams for optimized inference and deployment.
Strong troubleshooting and problem-solving skills in high-scale environments.
Knowledge of infrastructure security best practices cost optimization and performance tuning.
Exposure to vector databases and AI/ML deployment pipelines is highly desirable.

Responsibilities:

Maintain and manage Kubernetes clusters AWS/Azure environments and GPU infrastructure for high-performance workloads.
Design and implement CI/CD pipelines for seamless deployments and faster release cycles.
Set up and maintain monitoring and logging systems using Prometheus and Grafana to ensure system health and reliability.
Support vector database scaling and model deployment for AI/ML workloads.
Collaborate with ML engineering teams to optimize inference performance and resource utilization.
Ensure high availability security and scalability of infrastructure across multiple environments.
Automate infrastructure provisioning and configuration using Terraform and other IaC tools.
Troubleshoot production issues and implement proactive measures to prevent downtime.
Continuously improve deployment processes and infrastructure reliability through automation and best practices.
Participate in architecture reviews capacity planning and disaster recovery strategies.
Drive cost optimization initiatives for cloud resources and GPU utilization.
Stay updated with emerging technologies in cloud-native AI infrastructure and DevOps automation.

Qualifications :

Bachelors or masters degree in computer science Information Technology or a related field

Remote Work :

Employment Type :

Full-time

Requirement:Experience: 5 yearsStrong experience in DevOps or Site Reliability Engineering (SRE) roles.Strong knowledge of Docker Kubernetes Terraform and CI/CD pipelines.Hands-on experience with AWS Azure or other cloud platforms.Familiarity with GPU infrastructure and ML workloads is a plus.Good u...

Requirement:

Experience: 5 years
Strong experience in DevOps or Site Reliability Engineering (SRE) roles.
Strong knowledge of Docker Kubernetes Terraform and CI/CD pipelines.
Hands-on experience with AWS Azure or other cloud platforms.
Familiarity with GPU infrastructure and ML workloads is a plus.
Good understanding of monitoring and logging systems (Prometheus Grafana).
Ability to collaborate with ML teams for optimized inference and deployment.
Strong troubleshooting and problem-solving skills in high-scale environments.
Knowledge of infrastructure security best practices cost optimization and performance tuning.
Exposure to vector databases and AI/ML deployment pipelines is highly desirable.

Responsibilities:

Maintain and manage Kubernetes clusters AWS/Azure environments and GPU infrastructure for high-performance workloads.
Design and implement CI/CD pipelines for seamless deployments and faster release cycles.
Set up and maintain monitoring and logging systems using Prometheus and Grafana to ensure system health and reliability.
Support vector database scaling and model deployment for AI/ML workloads.
Collaborate with ML engineering teams to optimize inference performance and resource utilization.
Ensure high availability security and scalability of infrastructure across multiple environments.
Automate infrastructure provisioning and configuration using Terraform and other IaC tools.
Troubleshoot production issues and implement proactive measures to prevent downtime.
Continuously improve deployment processes and infrastructure reliability through automation and best practices.
Participate in architecture reviews capacity planning and disaster recovery strategies.
Drive cost optimization initiatives for cloud resources and GPU utilization.
Stay updated with emerging technologies in cloud-native AI infrastructure and DevOps automation.

Qualifications :

Bachelors or masters degree in computer science Information Technology or a related field

Remote Work :

Employment Type :

Full-time

Key Skills

Computer Science
Docker
Kubernetes
Python
VMware
C/C++
Go
System Architecture
gRPC
OS Kernels
Perl
Distributed Systems

Apply Now

About Company

Nagarro

Nagarro helps future-proof your business through a forward-thinking, fluidic, and CARING mindset. We excel at digital engineering and help our clients become human-centric, digital-first organizations, augmenting their ability to be responsive, efficient, intimate, creative, and susta ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click