AI Infrastructure Engineer NVIDIA GPU

Not Interested
Bookmark
Report This Job

profile Job Location:

Lake Mary, FL - USA

profile Monthly Salary: Not Disclosed
Posted on: 8 hours ago
Vacancies: 1 Vacancy

Job Summary

In this role youll make an impact in the following ways:

  • Be hands-on with enterprise-grade NVIDIA AI infrastructure supporting GPU-based compute high-performance storage and network systems designed for ML/AI at scale.
  • Deploy monitor and troubleshoot containerized AI workloads using Kubernetes Docker and GPU orchestration tools like Run:AI and NVIDIA BCM.
  • Own the observability of our AI platformsmonitor health identify performance bottlenecks and make strategic recommendations to drive platform reliability and maturity.
  • Automate infrastructure operations and provisioning using Python Bash and tools like Terraform or Ansible to reduce manual toil and accelerate experimentation.
  • Maintain and scale AI training and inference pipelines integrating infrastructure workflows into CI/CD systems to enable seamless automated deployment of AI workloads.

To be successful in this role were seeking the following:

  • Bachelors degree in computer science or a related discipline or equivalent work experience required; advanced degree preferred8-10 years of related experience required; experience in the securities or financial services industry is a plus.
  • Experience with Linux administration (RHEL/Ubuntu) shell scripting and system-level debugging.
  • Proven experience running distributed systems in Kubernetes and containerized environments using Docker.
  • Familiarity with GPU resource management including NVIDIA GPU Operator and device plugin lifecycle.
  • Experience with CI/CD workflows and infrastructure automation tools such as GitLab CI Jenkins Terraform Helm or Ansible.
  • Knowledge of networking fundamentals and persistent storage systems.
  • Exposure to cloud platforms (AWS GCP Azure) and hybrid GPU environments.
  • Ability to read and support Python code focused on ML/AI pipeline integration.
  • Strong analytical and troubleshooting skills with a collaborative mindset.

Effective communication skills and proactive ownership of platform reliability and performance.

Regards

Mohammed ilyas

PH - or Text - or you can share the updated resume at com


Additional Information :

All your information will be kept confidential according to EEO guidelines.


Remote Work :

No


Employment Type :

Full-time

In this role youll make an impact in the following ways:Be hands-on with enterprise-grade NVIDIA AI infrastructure supporting GPU-based compute high-performance storage and network systems designed for ML/AI at scale.Deploy monitor and troubleshoot containerized AI workloads using Kubernetes Docker ...
View more view more

Key Skills

  • Jenkins
  • Ruby
  • Python
  • Active Directory
  • Cloud
  • PowerShell
  • Windows
  • AWS
  • Linux
  • SAN
  • Java
  • Troubleshoot
  • Backup
  • Puppet
  • hardware

About Company

Company Logo

We provide Recruitment and Staffing services to many industries and domain through our innovative and customized solutions and passionate commitment to research. Ability to understand the hiring strategies, availability of talent and compensation benchmarking makes us proud hiring par ... View more

View Profile View Profile