HPC (Engineer) Charlottesville, VA
Charlottesville, VA - USA
Job Summary
ATTENTION MILITARY AFFILIATED JOB SEEKERS - Our organization works with partner companies to source qualified talent for their open roles. The following position is available to Veterans Transitioning Military National Guard and Reserve Members Military Spouses Wounded Warriors and their Caregivers. If you have the required skill set education requirements and experience please click the submit button and follow the next steps. All positions are onsite unless otherwise stated.
One of our government contracting clients has a full-time opening in Charlottesville VA for a TS/SCI cleared HPC Engineer to assist users executing computational workloads within secure High Performance Computing (HPC) environments. The HPC Engineer will work directly with engineers analysts and researchers to support job execution troubleshoot workload failures and improve the performance and efficiency of compute workloads running on HPC clusters.
The Engineer will assist users with scheduler job scripts application execution and workload performance troubleshooting while promoting HPC best practices for efficient cluster utilization. This role serves as the primary interface between mission users and HPC platform infrastructure teams.
This position requires strong Linux experience scripting capability and familiarity with distributed computing environments supporting scientific or engineering workloads.
This position is onsite in Charlottesville VA.
Required:
- TS/SCI Clearance
- Ability to obtain DoD 8140 (8570) IAT Level II certification
Responsibilities:
- Provide user support for computational workloads running on HPC clusters in classified and unclassified environments.
- Assist users in developing submitting and troubleshooting scheduler job scripts for systems such as Slurm or PBS including resource allocation for CPU GPU and distributed compute workloads.
- Troubleshoot slow hanging or failing HPC jobs including MPI based distributed workloads GPU jobs and large scale parallel applications.
- Support users compiling and executing scientific modeling or data processing applications within Linux based HPC environments.
- Provide guidance on HPC best practices for job scheduling compute resource allocation and workload performance.
- Monitor workload execution patterns and provide guidance to improve cluster throughput and resource utilization.
- Develop scripts or tools using Bash or Python to automate common operational tasks.
- Maintain documentation and knowledge base articles describing system capabilities job execution procedures and troubleshooting guidance.
- Support performance analysis of compute workloads to identify inefficiencies or configuration issues.
- Coordinate with HPC systems engineers when infrastructure or cluster configuration issues impact workload performance.
- Provide responsive on site support for users executing HPC workloads in mission environments.
- Maintain source controlled scripting and tools using Git or similar version control platforms.
- Assist users with environment modules and runtime environments required for executing HPC applications.
- BS degree in Engineering Computer Science or related STEM field
- Experience may be substituted for degree
- TS/SCI Clearance
- Ability to obtain DoD 8140 (8570) IAT Level II certification
- Minimum 5 years of Linux experience including command line system usage scripting and troubleshooting applications in multi-user server environments.
- Professional experience administering or supporting command line Linux systems (RHEL derivatives preferred).
- Experience developing scripts using Bash Python or similar scripting languages.
- Experience troubleshooting software execution issues in distributed computing environments.
- Working knowledge of job scheduling systems such as Slurm PBS Torque or similar platforms.
- Experience supporting users in technical computing or engineering environments.
- Strong troubleshooting and analytical skills.
- Ability to communicate technical concepts clearly to both technical and non-technical users.
- Active TS/SCI security clearance.
- Experience as a user or administrator of HPC clusters.
- Experience supporting parallel computing frameworks such as MPI OpenMP or CUDA based GPU workloads.
- Experience supporting scientific or engineering applications requiring large scale compute resources.
- Experience using performance monitoring and optimization tools for compute workloads.
- Experience compiling applications using C C Fortran or Python based environments.
- Experience working in classified computing environments.
- Experience supporting GPU enabled workloads.
Required Experience:
IC
About Company
VetJobs & Military Spouse Jobs works with our employer partners to source, screen, and move qualified talent to the desktops of the Hiring Managers. Application is a two-step process, so please be patient with the team. When you submit to a position on our site your information will ... View more