Cloud Infrastructure Engineer

Key2Source

Job Location:

Charlotte, NC - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Title: Cloud Infrastructure Engineer

Location: Charlotte NC (5 Days onsite)

Duration: 12 months

Primary Skills

vLLM
TensorRT-LLM
Triton Inference Server
SGLang
Kubernetes ML Serving
KServe
OpenShift AI
GPU Orchestration
GCP
Terraform

Key Responsibilities

Design and manage scalable AI/ML infrastructure for GenAI and LLM workloads.
Deploy and optimize LLM inference pipelines using vLLM TensorRT-LLM Triton Inference Server and SGLang.
Implement inference optimization techniques including:

Continuous Batching
Speculative Decoding
KV Cache / Prefix Caching
FP8 / AWQ / GPTQ quantization
Tensor Parallelism

Build and maintain Kubernetes-based ML serving platforms using KServe and OpenShift AI.
Manage GPU orchestration and scheduling using technologies such as Run:AI CUDA NCCL and MIG.
Develop Helm charts Kubernetes Operators and platform automation for AI workloads.
Conduct performance benchmarking and optimization for GPU-based inference systems.
Implement monitoring and observability using Prometheus and Grafana.
Collaborate with data science and ML engineering teams to productionize LLM models.
Automate infrastructure provisioning and deployment using Terraform.

Required Qualifications

6 years of experience in cloud engineering or platform engineering.
Experience with LLMOps/MLOps platforms.
Strong hands-on experience with Kubernetes and containerized AI/ML workloads.
Experience with GPU infrastructure and distributed inference optimization.
Proficiency in GCP cloud services and cloud-native architecture.
Strong scripting/programming skills in Python.
Experience with ML observability and production monitoring tools.
Familiarity with OpenShift AI and enterprise Kubernetes ecosystems.

Preferred Qualifications

Knowledge of GenAI frameworks and RAG architectures.
Exposure to enterprise AI governance and security practices.

Job Title: Cloud Infrastructure Engineer Location: Charlotte NC (5 Days onsite) Duration: 12 months Primary Skills vLLM TensorRT-LLM Triton Inference Server SGLang Kubernetes ML Serving KServe OpenShift AI GPU Orchestration GCP Terraform Key Responsibilities Design and manage scalable AI/ML infra...