HPC consultant

VDart Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Fremont, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 6 hours ago
Vacancies: 1 Vacancy

Job Summary

Role- HPC consultant
Location- Fremont CA/ Tualatin OR
Contract
HPC Cluster & Scheduler Management
Design configure tune and optimize SLURM partitions queues QoS and scheduling policies to maximize cluster utilization and workload efficiency.
Perform in-depth analysis of job scheduling behavior bottlenecks and resource contention.
Troubleshoot job failures performance degradation and scheduler-related issues in production HPC environments.
Implement fair-share backfill reservations and policy-driven scheduling as required.
Storage Benchmarking & Procurement Support
Lead HPC storage performance benchmarking using industry-standard tools (e.g. IOR FIO MDTest IOzone).
Analyze I/O patterns of HPC workloads and map them to appropriate storage architectures (parallel file systems NVMe Lustre Spectrum Scale etc.).
Provide technical input for storage selection and procurement including performance expectations sizing and cost-performance tradeoffs.
Collaborate with vendors and internal teams during POCs and performance validation exercises.
HPC Application Build & Optimization
Build install configure and maintain HPC applications compilers libraries and scientific software stacks.
Optimize application performance using MPI OpenMP GPU acceleration (where applicable) and tuned math libraries.
Support multiple compiler toolchains (GCC Intel LLVM NVIDIA HPC SDK etc.).
Implement and manage environment modules (Lmod) or similar software management frameworks.
System Performance & Operations
Conduct system-level performance tuning across compute memory network and storage layers.
Diagnose node-level issues involving CPU GPU interconnects (InfiniBand/Ethernet) and OS configurations.
Create operational runbooks performance baselines and troubleshooting documentation.
Support cluster upgrades expansions and hardware refresh activities.
Collaboration & Delivery
Work closely with application owners researchers and infrastructure teams to meet aggressive delivery timelines.
Translate workload requirements into practical HPC configurations and optimizations.
Provide clear technical guidance and recommendations to leadership and stakeholders.
Required Skills & Experience
Core HPC Skills
8 12 years of hands-on HPC engineering experience in production environments.
Strong expertise with SLURM (configuration tuning troubleshooting).
Solid understanding of Linux systems (RHEL/CentOS/Rocky/Alma preferred).
Deep knowledge of HPC storage systems and I/O performance analysis.
Proven experience building and optimizing HPC applications and libraries.
Technical Proficiency
MPI implementations (Open MPI MPICH) OpenMP
Compilers and toolchains (GCC Intel NVIDIA HPC SDK)
Performance tools (perf vtune nvprof/nsys IB diagnostics)
Environment modules (Lmod) package managers (Spack preferred)
Bash/Python scripting for automation and diagnostics
Nice to Have
Experience with GPU-based HPC workloads (NVIDIA CUDA ROCm).
Exposure to cloud-based HPC (Azure AWS GCP).
Familiarity with parallel file systems such as Lustre or IBM Spectrum Scale.
Vendor engagement experience for HPC hardware/storage evaluations.
Role- HPC consultant Location- Fremont CA/ Tualatin OR Contract HPC Cluster & Scheduler Management Design configure tune and optimize SLURM partitions queues QoS and scheduling policies to maximize cluster utilization and workload efficiency. Perform in-depth analysis of job scheduling ...
View more view more