AIML Ops Engineer – Sanity Check (NVIDIA)

Key2Source

Job Location:

Charlotte, NC - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Job Title: AI/ML Ops Engineer Sanity Check (NVIDIA)

Location: Charlotte NC (Onsite)

Job Summary

We are seeking an experienced AI/ML Ops Engineer with strong expertise in NVIDIA GPU environments AI/ML infrastructure and sanity testing/validation processes. The ideal candidate will support deployment monitoring and operational validation of AI/ML workloads while ensuring system performance stability and reliability across GPU-based platforms.

Key Responsibilities

Perform sanity checks and validation for AI/ML models pipelines and GPU environments.
Manage and optimize NVIDIA GPU-based AI/ML infrastructure.
Monitor AI/ML workloads troubleshoot issues and ensure high system availability.
Work with MLOps tools for deployment automation and CI/CD processes.
Collaborate with AI engineers DevOps and infrastructure teams for production support.
Analyze logs performance metrics and system behavior to identify bottlenecks.

Required Skills

Strong experience in AI/ML Ops or MLOps environments.
Experience with Sanity testing.
Hands-on experience with NVIDIA GPUs CUDA TensorRT or related technologies.
Knowledge of Kubernetes Docker Linux and cloud platforms (AWS/Azure/GCP).
Experience with Python scripting and automation tools.
Familiarity with monitoring testing and sanity validation processes for AI systems.
Experience with ML model deployment and performance tuning.
Understanding of CI/CD pipelines and infrastructure automation.

Primary Skills

GCP
Azure
Terraform
Kubernetes
Python
GenAI Platforms
Arize AI
Claude Cowork
HashiCorp Vault
LLMs
RAG

Job Title: AI/ML Ops Engineer Sanity Check (NVIDIA) Location: Charlotte NC (Onsite) Job Summary We are seeking an experienced AI/ML Ops Engineer with strong expertise in NVIDIA GPU environments AI/ML infrastructure and sanity testing/validation processes. The ideal candidate will support depl...