AI Platform LLM Infrastructure Engineer (NVIDIA + Hybrid Cloud)

NewVision

Not Interested
Bookmark
Report This Job

profile Job Location:

Pune - India

profile Monthly Salary: Not Disclosed
Posted on: 2 days ago
Vacancies: 1 Vacancy

Job Summary

Role Overview

We are seeking an experienced AI Platform / LLM Infrastructure Engineer to support and operate enterprise-grade AI infrastructure for a global banking environment. This role focuses on ensuring scalability high availability low latency security and regulatory compliance across AI/LLM platforms.

The engineer will manage the core platform enabling enterprise teams to securely consume AI/LLM services via governed interfaces spanning on-prem GPU infrastructure Kubernetes-based workloads LLM inference systems and vector databases.

Key Responsibilities

  • Operate and support NVIDIA GPU infrastructure ensuring high performance stability and optimal utilization

  • Administer GPU scheduling and orchestration using NVIDIA RunAI

  • Manage and optimize Kubernetes clusters running GPU-enabled workloads

  • Deploy and tune LLM inference services using vLLM and NVIDIA Triton Inference Server

  • Maintain and scale vector databases such as Elasticsearch and Milvus supporting RAG workloads

  • Support data science workflows via notebook environments (e.g. Jupyter)

  • Ensure operational excellence through runbooks documentation change management and audit readiness

  • Lead incident management perform root cause analysis and implement preventive measures

  • Drive platform scalability capacity planning and continuous improvements

  • Collaborate with cross-functional teams and communicate effectively during incidents

Infrastructure & Technology Environment

Hardware / Compute

  • NVIDIA GPU platforms: A100 H100 and B200 (Blackwell)

  • GPU scheduling via NVIDIA RunAI

  • Dell systems with NVIDIA RTX A6000 for visualization workloads

Operating Environment

  • Linux-based production systems with high availability

  • Containerized workloads on Kubernetes

Programming / Scripting

  • Python for automation and integrations

  • Bash for system operations and diagnostics

AI / ML Platform Stack

  • LLM inference: vLLM NVIDIA Triton Inference Server

  • Vector DB: Elasticsearch Milvus

  • AI enablement tools: Dataiku Snorkel

Agentic AI (Integration Context)

  • Support infrastructure enabling agent-based AI workflows via enterprise gateways

  • Understand interactions between LLMs retrieval systems and inference services

  • Focus on platform stability and integration not core agent development

Required Qualifications

  • Strong experience managing Linux-based production environments

  • Hands-on expertise with NVIDIA GPU platforms (A100/H100; B200 preferred)

  • Experience with GPU scheduling tools like NVIDIA RunAI

  • Deep operational knowledge of Kubernetes in GPU environments

  • Proficiency in Python and Bash scripting

  • Experience with LLM inference platforms such as vLLM or NVIDIA Triton Inference Server

  • Experience managing vector databases like Elasticsearch or Milvus

  • Experience working in regulated or security-sensitive environments

Preferred Qualifications

  • Familiarity with NVIDIA AI ecosystem (CUDA NGC containers NeMo NeMo Guardrails)

  • Exposure to agent-based AI systems and enterprise AI platforms

  • Experience supporting GPU-intensive visualization workloads (RTX A6000)

Key Competencies

  • Strong problem-solving and troubleshooting skills

  • Ability to manage high-performance mission-critical systems

  • Effective communication and stakeholder collaboration

  • Ownership mindset with focus on reliability and continuous improvement


Required Experience:

IC

Role OverviewWe are seeking an experienced AI Platform / LLM Infrastructure Engineer to support and operate enterprise-grade AI infrastructure for a global banking environment. This role focuses on ensuring scalability high availability low latency security and regulatory compliance across AI/LLM pl...
View more view more