Linux Admin

Innovitusa

Not Interested
Bookmark
Report This Job

profile Job Location:

Jackson, MO - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Hiring: W2 Candidates Only


Visa:Open to any visa typewith valid work authorization in the USA

System Management: Administer and maintain Linux-based servers and clusters
optimized for GPU compute workloads ensuring high availability and performance.
GPU Infrastructure: Configure monitor and troubleshoot GPU hardware (e.g. NVIDIA
GPUs) and related software stacks (e.g. CUDA cuDNN) for optimal performance in
AI/ML and HPC applications.
Troubleshooting: Diagnose and resolve hardware and software issues related to GPU
compute nodes and performance issues in GPU clusters.
High-Speed Interconnects: Implement and manage high-speed networking
technologies like RDMA over Converged Ethernet (RoCE) to support low-latency
high-bandwidth communication for GPU workloads.
CI/CD Pipelines: Build and optimize continuous integration and deployment (CI/CD)
pipelines for testing GPU-based servers and managing deployments using tools like
GitHub Actions.
Monitoring & Performance: Set up and maintain monitoring logging and alerting
systems (e.g. Prometheus Victoria Metrics Grafana) to track system performance
GPU utilization resource bottlenecks and uptime of GPU resources.
Security and Compliance: Implement network security measures including firewalls
VLANs VPNs and intrusion detection systems to protect the GPU compute
environment and comply with standards like SOC 2 or ISO 27001.



Required Qualifications


Experience: 8 years of experience in DevOps Site Reliability Engineering (SRE) or
cloud infrastructure management with at least 5 year working on GPU-based compute
environments in the cloud.
Linux Administration: Strong knowledge of Linux system administration for managing
network services and tools in a GPU compute environment.
High-Speed Interconnects: Experience with high-performance networking technologies
like RoCE or 100GbE Ethernet in compute-intensive environments.
GPU-Specific Networking: Proficiency with NVIDIA GPU networking technologies
such as Mellanox ConnectX adapters and configuring Netplan to support their drivers
and firmware.
Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS
Azure GCP).
Networking & Security: Knowledge of networking concepts (VPC subnets) and
security best practices (IAM encryption firewall configurations).

Hiring: W2 Candidates Only Visa:Open to any visa typewith valid work authorization in the USA System Management: Administer and maintain Linux-based servers and clustersoptimized for GPU compute workloads ensuring high availability and performance. GPU Infrastructure: Configure monitor and trouble...
View more view more

Key Skills

  • Air Freight
  • Accounting & Finance
  • Electrical Commissioning
  • General Services
  • Civil Engineering
  • Linux