AIML Ops Engineer – Sanity Check (NVIDIA)

Key2Source


Job Location:

Charlotte, VT - USA

Monthly Salary: Not Disclosed
Posted on: 7 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Title: AI/ML Ops Engineer Sanity Check (NVIDIA)

Location: Charlotte NC (Onsite)

Job Summary

We are seeking an experienced AI/ML Ops Engineer with strong expertise in NVIDIA GPU environments AI/ML infrastructure and sanity testing/validation processes. The ideal candidate will support deployment monitoring and operational validation of AI/ML workloads while ensuring system performance stability and reliability across GPU-based platforms.

Key Responsibilities

  • Perform sanity checks and validation for AI/ML models pipelines and GPU environments.
  • Manage and optimize NVIDIA GPU-based AI/ML infrastructure.
  • Monitor AI/ML workloads troubleshoot issues and ensure high system availability.
  • Work with MLOps tools for deployment automation and CI/CD processes.
  • Collaborate with AI engineers DevOps and infrastructure teams for production support.
  • Analyze logs performance metrics and system behavior to identify bottlenecks.

Required Skills

  • Strong experience in AI/ML Ops or MLOps environments.
  • Experience with Sanity testing.
  • Hands-on experience with NVIDIA GPUs CUDA TensorRT or related technologies.
  • Knowledge of Kubernetes Docker Linux and cloud platforms (AWS/Azure/GCP).
  • Experience with Python scripting and automation tools.
  • Familiarity with monitoring testing and sanity validation processes for AI systems.
  • Experience with ML model deployment and performance tuning.
  • Understanding of CI/CD pipelines and infrastructure automation.

Primary Skills

  • GCP
  • Azure
  • Terraform
  • Kubernetes
  • Python
  • GenAI Platforms
  • Arize AI
  • Claude Cowork
  • HashiCorp Vault
  • LLMs
  • RAG
Job Title: AI/ML Ops Engineer Sanity Check (NVIDIA) Location: Charlotte NC (Onsite) Job Summary We are seeking an experienced AI/ML Ops Engineer with strong expertise in NVIDIA GPU environments AI/ML infrastructure and sanity testing/validation processes. The ideal candidate will support depl...