AI Platform LLM Infrastructure Engineer (NVIDIA + Hybrid Cloud)
Job Summary
Role Overview
We are seeking an experienced AI Platform / LLM Infrastructure Engineer to support and operate enterprise-grade AI infrastructure for a global banking environment. This role focuses on ensuring scalability high availability low latency security and regulatory compliance across AI/LLM platforms.
The engineer will manage the core platform enabling enterprise teams to securely consume AI/LLM services via governed interfaces spanning on-prem GPU infrastructure Kubernetes-based workloads LLM inference systems and vector databases.
Key Responsibilities
Operate and support NVIDIA GPU infrastructure ensuring high performance stability and optimal utilization
Administer GPU scheduling and orchestration using NVIDIA RunAI
Manage and optimize Kubernetes clusters running GPU-enabled workloads
Deploy and tune LLM inference services using vLLM and NVIDIA Triton Inference Server
Maintain and scale vector databases such as Elasticsearch and Milvus supporting RAG workloads
Support data science workflows via notebook environments (e.g. Jupyter)
Ensure operational excellence through runbooks documentation change management and audit readiness
Lead incident management perform root cause analysis and implement preventive measures
Drive platform scalability capacity planning and continuous improvements
Collaborate with cross-functional teams and communicate effectively during incidents
Infrastructure & Technology Environment
Hardware / Compute
NVIDIA GPU platforms: A100 H100 and B200 (Blackwell)
GPU scheduling via NVIDIA RunAI
Dell systems with NVIDIA RTX A6000 for visualization workloads
Operating Environment
Linux-based production systems with high availability
Containerized workloads on Kubernetes
Programming / Scripting
Python for automation and integrations
Bash for system operations and diagnostics
AI / ML Platform Stack
LLM inference: vLLM NVIDIA Triton Inference Server
Vector DB: Elasticsearch Milvus
AI enablement tools: Dataiku Snorkel
Agentic AI (Integration Context)
Support infrastructure enabling agent-based AI workflows via enterprise gateways
Understand interactions between LLMs retrieval systems and inference services
Focus on platform stability and integration not core agent development
Required Qualifications
Strong experience managing Linux-based production environments
Hands-on expertise with NVIDIA GPU platforms (A100/H100; B200 preferred)
Experience with GPU scheduling tools like NVIDIA RunAI
Deep operational knowledge of Kubernetes in GPU environments
Proficiency in Python and Bash scripting
Experience with LLM inference platforms such as vLLM or NVIDIA Triton Inference Server
Experience managing vector databases like Elasticsearch or Milvus
Experience working in regulated or security-sensitive environments
Preferred Qualifications
Familiarity with NVIDIA AI ecosystem (CUDA NGC containers NeMo NeMo Guardrails)
Exposure to agent-based AI systems and enterprise AI platforms
Experience supporting GPU-intensive visualization workloads (RTX A6000)
Key Competencies
Strong problem-solving and troubleshooting skills
Ability to manage high-performance mission-critical systems
Effective communication and stakeholder collaboration
Ownership mindset with focus on reliability and continuous improvement
Required Experience:
IC