Job Title: Cloud Infrastructure Engineer
Location: Charlotte NC (5 Days onsite)
Duration: 12 months
Primary Skills
- vLLM
- TensorRT-LLM
- Triton Inference Server
- SGLang
- Kubernetes ML Serving
- KServe
- OpenShift AI
- GPU Orchestration
- GCP
- Terraform
Key Responsibilities
- Design and manage scalable AI/ML infrastructure for GenAI and LLM workloads.
- Deploy and optimize LLM inference pipelines using vLLM TensorRT-LLM Triton Inference Server and SGLang.
- Implement inference optimization techniques including:
- Continuous Batching
- Speculative Decoding
- KV Cache / Prefix Caching
- FP8 / AWQ / GPTQ quantization
- Tensor Parallelism
- Build and maintain Kubernetes-based ML serving platforms using KServe and OpenShift AI.
- Manage GPU orchestration and scheduling using technologies such as Run:AI CUDA NCCL and MIG.
- Develop Helm charts Kubernetes Operators and platform automation for AI workloads.
- Conduct performance benchmarking and optimization for GPU-based inference systems.
- Implement monitoring and observability using Prometheus and Grafana.
- Collaborate with data science and ML engineering teams to productionize LLM models.
- Automate infrastructure provisioning and deployment using Terraform.
Required Qualifications
- 6 years of experience in cloud engineering or platform engineering.
- Experience with LLMOps/MLOps platforms.
- Strong hands-on experience with Kubernetes and containerized AI/ML workloads.
- Experience with GPU infrastructure and distributed inference optimization.
- Proficiency in GCP cloud services and cloud-native architecture.
- Strong scripting/programming skills in Python.
- Experience with ML observability and production monitoring tools.
- Familiarity with OpenShift AI and enterprise Kubernetes ecosystems.
Preferred Qualifications
- Knowledge of GenAI frameworks and RAG architectures.
- Exposure to enterprise AI governance and security practices.
Job Title: Cloud Infrastructure Engineer Location: Charlotte NC (5 Days onsite) Duration: 12 months Primary Skills vLLM TensorRT-LLM Triton Inference Server SGLang Kubernetes ML Serving KServe OpenShift AI GPU Orchestration GCP Terraform Key Responsibilities Design and manage scalable AI/ML infra...
Job Title: Cloud Infrastructure Engineer
Location: Charlotte NC (5 Days onsite)
Duration: 12 months
Primary Skills
- vLLM
- TensorRT-LLM
- Triton Inference Server
- SGLang
- Kubernetes ML Serving
- KServe
- OpenShift AI
- GPU Orchestration
- GCP
- Terraform
Key Responsibilities
- Design and manage scalable AI/ML infrastructure for GenAI and LLM workloads.
- Deploy and optimize LLM inference pipelines using vLLM TensorRT-LLM Triton Inference Server and SGLang.
- Implement inference optimization techniques including:
- Continuous Batching
- Speculative Decoding
- KV Cache / Prefix Caching
- FP8 / AWQ / GPTQ quantization
- Tensor Parallelism
- Build and maintain Kubernetes-based ML serving platforms using KServe and OpenShift AI.
- Manage GPU orchestration and scheduling using technologies such as Run:AI CUDA NCCL and MIG.
- Develop Helm charts Kubernetes Operators and platform automation for AI workloads.
- Conduct performance benchmarking and optimization for GPU-based inference systems.
- Implement monitoring and observability using Prometheus and Grafana.
- Collaborate with data science and ML engineering teams to productionize LLM models.
- Automate infrastructure provisioning and deployment using Terraform.
Required Qualifications
- 6 years of experience in cloud engineering or platform engineering.
- Experience with LLMOps/MLOps platforms.
- Strong hands-on experience with Kubernetes and containerized AI/ML workloads.
- Experience with GPU infrastructure and distributed inference optimization.
- Proficiency in GCP cloud services and cloud-native architecture.
- Strong scripting/programming skills in Python.
- Experience with ML observability and production monitoring tools.
- Familiarity with OpenShift AI and enterprise Kubernetes ecosystems.
Preferred Qualifications
- Knowledge of GenAI frameworks and RAG architectures.
- Exposure to enterprise AI governance and security practices.
View more
View less