AI Platform LLM Infrastructure Engineer (NVIDIA + Hybrid Cloud)

NewVison

Job Location:

Pune - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Role Overview

We are seeking an experienced AI Platform / LLM Infrastructure Engineer to support and operate enterprise-grade AI infrastructure for a global banking environment. This role focuses on ensuring scalability high availability low latency security and regulatory compliance across AI/LLM platforms.

The engineer will manage the core platform enabling enterprise teams to securely consume AI/LLM services via governed interfaces spanning on-prem GPU infrastructure Kubernetes-based workloads LLM inference systems and vector databases.

Key Responsibilities

Operate and support NVIDIA GPU infrastructure ensuring high performance stability and optimal utilization
Administer GPU scheduling and orchestration using NVIDIA RunAI
Manage and optimize Kubernetes clusters running GPU-enabled workloads
Deploy and tune LLM inference services using vLLM and NVIDIA Triton Inference Server
Maintain and scale vector databases such as Elasticsearch and Milvus supporting RAG workloads
Support data science workflows via notebook environments (e.g. Jupyter)
Ensure operational excellence through runbooks documentation change management and audit readiness
Lead incident management perform root cause analysis and implement preventive measures
Drive platform scalability capacity planning and continuous improvements
Collaborate with cross-functional teams and communicate effectively during incidents

Infrastructure & Technology Environment

Hardware / Compute

NVIDIA GPU platforms: A100 H100 and B200 (Blackwell)
GPU scheduling via NVIDIA RunAI
Dell systems with NVIDIA RTX A6000 for visualization workloads

Operating Environment

Linux-based production systems with high availability
Containerized workloads on Kubernetes

Programming / Scripting

Python for automation and integrations
Bash for system operations and diagnostics

AI / ML Platform Stack

LLM inference: vLLM NVIDIA Triton Inference Server
Vector DB: Elasticsearch Milvus
AI enablement tools: Dataiku Snorkel

Agentic AI (Integration Context)

Support infrastructure enabling agent-based AI workflows via enterprise gateways
Understand interactions between LLMs retrieval systems and inference services
Focus on platform stability and integration not core agent development

Required Qualifications

Strong experience managing Linux-based production environments
Hands-on expertise with NVIDIA GPU platforms (A100/H100; B200 preferred)
Experience with GPU scheduling tools like NVIDIA RunAI
Deep operational knowledge of Kubernetes in GPU environments
Proficiency in Python and Bash scripting
Experience with LLM inference platforms such as vLLM or NVIDIA Triton Inference Server
Experience managing vector databases like Elasticsearch or Milvus
Experience working in regulated or security-sensitive environments

Preferred Qualifications

Familiarity with NVIDIA AI ecosystem (CUDA NGC containers NeMo NeMo Guardrails)
Exposure to agent-based AI systems and enterprise AI platforms
Experience supporting GPU-intensive visualization workloads (RTX A6000)

Key Competencies

Strong problem-solving and troubleshooting skills
Ability to manage high-performance mission-critical systems
Effective communication and stakeholder collaboration
Ownership mindset with focus on reliability and continuous improvement

Required Experience:

Role OverviewWe are seeking an experienced AI Platform / LLM Infrastructure Engineer to support and operate enterprise-grade AI infrastructure for a global banking environment. This role focuses on ensuring scalability high availability low latency security and regulatory compliance across AI/LLM pl...

Role Overview

Key Responsibilities

Operate and support NVIDIA GPU infrastructure ensuring high performance stability and optimal utilization
Administer GPU scheduling and orchestration using NVIDIA RunAI
Manage and optimize Kubernetes clusters running GPU-enabled workloads
Deploy and tune LLM inference services using vLLM and NVIDIA Triton Inference Server
Maintain and scale vector databases such as Elasticsearch and Milvus supporting RAG workloads
Support data science workflows via notebook environments (e.g. Jupyter)
Ensure operational excellence through runbooks documentation change management and audit readiness
Lead incident management perform root cause analysis and implement preventive measures
Drive platform scalability capacity planning and continuous improvements
Collaborate with cross-functional teams and communicate effectively during incidents

Infrastructure & Technology Environment

Hardware / Compute

NVIDIA GPU platforms: A100 H100 and B200 (Blackwell)
GPU scheduling via NVIDIA RunAI
Dell systems with NVIDIA RTX A6000 for visualization workloads

Operating Environment

Linux-based production systems with high availability
Containerized workloads on Kubernetes

Programming / Scripting

Python for automation and integrations
Bash for system operations and diagnostics

AI / ML Platform Stack

LLM inference: vLLM NVIDIA Triton Inference Server
Vector DB: Elasticsearch Milvus
AI enablement tools: Dataiku Snorkel

Agentic AI (Integration Context)

Support infrastructure enabling agent-based AI workflows via enterprise gateways
Understand interactions between LLMs retrieval systems and inference services
Focus on platform stability and integration not core agent development

Required Qualifications

Strong experience managing Linux-based production environments
Hands-on expertise with NVIDIA GPU platforms (A100/H100; B200 preferred)
Experience with GPU scheduling tools like NVIDIA RunAI
Deep operational knowledge of Kubernetes in GPU environments
Proficiency in Python and Bash scripting
Experience with LLM inference platforms such as vLLM or NVIDIA Triton Inference Server
Experience managing vector databases like Elasticsearch or Milvus
Experience working in regulated or security-sensitive environments

Preferred Qualifications

Familiarity with NVIDIA AI ecosystem (CUDA NGC containers NeMo NeMo Guardrails)
Exposure to agent-based AI systems and enterprise AI platforms
Experience supporting GPU-intensive visualization workloads (RTX A6000)

Key Competencies

Strong problem-solving and troubleshooting skills
Ability to manage high-performance mission-critical systems
Effective communication and stakeholder collaboration
Ownership mindset with focus on reliability and continuous improvement

Required Experience:

Key Skills

Apply Now

About Company

NewVison

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

AI Platform LLM Infrastructure Engineer (NVIDIA + Hybrid Cloud)

Pune - India

Job Summary

Role Overview

Key Responsibilities

Infrastructure & Technology Environment

Hardware / Compute

Operating Environment

Programming / Scripting

AI / ML Platform Stack

Agentic AI (Integration Context)

Required Qualifications

Preferred Qualifications

Key Competencies

Role Overview

Key Responsibilities

Infrastructure & Technology Environment

Hardware / Compute

Operating Environment

Programming / Scripting

AI / ML Platform Stack

Agentic AI (Integration Context)

Required Qualifications

Preferred Qualifications

Key Competencies

Key Skills

About Company

Related Jobs