HPC consultant

Not Interested
Bookmark
Report This Job

profile Job Location:

Fremont, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 3 hours ago
Vacancies: 1 Vacancy

Job Summary

Role- HPC consultant

Location- Fremont CA/ Tualatin OR

HPC Cluster & Scheduler Management

  • Design configure tune and optimize SLURM partitions queues QoS and scheduling policies to maximize cluster utilization and workload efficiency.
  • Perform in-depth analysis of job scheduling behavior bottlenecks and resource contention.
  • Troubleshoot job failures performance degradation and scheduler-related issues in production HPC environments.
  • Implement fair-share backfill reservations and policy-driven scheduling as required.

Storage Benchmarking & Procurement Support

  • Lead HPC storage performance benchmarking using industry-standard tools (e.g. IOR FIO MDTest IOzone).
  • Analyze I/O patterns of HPC workloads and map them to appropriate storage architectures (parallel file systems NVMe Lustre Spectrum Scale etc.).
  • Provide technical input for storage selection and procurement including performance expectations sizing and cost-performance tradeoffs.
  • Collaborate with vendors and internal teams during POCs and performance validation exercises.

HPC Application Build & Optimization

  • Build install configure and maintain HPC applications compilers libraries and scientific software stacks.
  • Optimize application performance using MPI OpenMP GPU acceleration (where applicable) and tuned math libraries.
  • Support multiple compiler toolchains (GCC Intel LLVM NVIDIA HPC SDK etc.).
  • Implement and manage environment modules (Lmod) or similar software management frameworks.

System Performance & Operations

  • Conduct system-level performance tuning across compute memory network and storage layers.
  • Diagnose node-level issues involving CPU GPU interconnects (InfiniBand/Ethernet) and OS configurations.
  • Create operational runbooks performance baselines and troubleshooting documentation.
  • Support cluster upgrades expansions and hardware refresh activities.

Collaboration & Delivery

  • Work closely with application owners researchers and infrastructure teams to meet aggressive delivery timelines.
  • Translate workload requirements into practical HPC configurations and optimizations.
  • Provide clear technical guidance and recommendations to leadership and stakeholders.

Required Skills & Experience

Core HPC Skills

  • 8 12 years of hands-on HPC engineering experience in production environments.
  • Strong expertise with SLURM (configuration tuning troubleshooting).
  • Solid understanding of Linux systems (RHEL/CentOS/Rocky/Alma preferred).
  • Deep knowledge of HPC storage systems and I/O performance analysis.
  • Proven experience building and optimizing HPC applications and libraries.

Technical Proficiency

  • MPI implementations (Open MPI MPICH) OpenMP
  • Compilers and toolchains (GCC Intel NVIDIA HPC SDK)
  • Performance tools (perf vtune nvprof/nsys IB diagnostics)
  • Environment modules (Lmod) package managers (Spack preferred)
  • Bash/Python scripting for automation and diagnostics

Nice to Have

  • Experience with GPU-based HPC workloads (NVIDIA CUDA ROCm).
  • Exposure to cloud-based HPC (Azure AWS GCP).
  • Familiarity with parallel file systems such as Lustre or IBM Spectrum Scale.
  • Vendor engagement experience for HPC hardware/storage evaluations.

Role- HPC consultant Location- Fremont CA/ Tualatin OR HPC Cluster & Scheduler Management Design configure tune and optimize SLURM partitions queues QoS and scheduling policies to maximize cluster utilization and workload efficiency. Perform in-depth analysis of job scheduling behavior bottlenec...
View more view more