Director, HPC Infrastructure Engineering
Palo Alto, CA - USA
Job Summary
Company Description
Guardant Health is a leading precision oncology company focused on guarding wellness and giving every person more time free from cancer. Founded in 2012 Guardant is transforming patient care and accelerating new cancer therapies by providing critical insights into what drives disease through its advanced blood and tissue tests real-world data and AI analytics. Guardant tests help improve outcomes across all stages of care including screening to find cancer early monitoring for recurrence in early-stage cancer and treatment selection for patients with advanced cancer. For more information visitand follow the company onLinkedInX (Twitter)andFacebook.
Guardant Healths High-Performance Computing team (HPC) builds and operates the computational technology infrastructure backbone of the company.
This includes scalable data storage that holds petabytes of genomics data high performance compute clusters running a custom bioinformatics pipeline in production and R&D environments and the softwareinfrastructurethat hosts an ecosystem of services for internal data processing and external data facilitateGuardantHealths fast growth in the next few years the HPC team is seeking a strong technical engineering leader who can help maintain and grow the HPC infrastructure during its expansion while partnering with other engineering functions (Corporate IT SQA and DevOps/SRE) as well as the R&D user community and Lab Operations. This is a hands-on technical leadership position that will leverage your expertise in HPC environments as well as your experience leading and managing a team.
Role: Director HPC Infrastructure Engineering
Location: Preference is given to candidates located in the San Francisco Bay Area with the ability to work onsite in Redwood City and Palo Alto; however the role offers partial remote flexibility.
Onsite presence is required during rotational coverage scheduled maintenance windows and cluster deployment activities.
In this role you will primarily lead an engineering team to:
- Oversee and manage the HPC environment compute storage network physical infrastructure and software serving multiple Production and Development clusters
- Integrate HPC systems with on-prem and cloud-based systems and data sources as required
- Administer multiple HPC clusters and associated cluster file systems
- Research design and implement next-generation HPC solutions
- Diagnose and resolve production system stack issues leveraging software utilities down to the source code level (e.g. shell scripts Python etc.)
- Maintain and monitor infrastructure and facilities to ensure operational stability
- Drive continuous improvement initiatives to enhance reliability and performance as workloads and data volumes scale
- Ensure control integrity and accessibility across systems and applications serving multiple concurrent users
- Provide operational oversight for systems at remote and international locations
- Collaborate with offsite consultants to sustain and optimize infrastructure performance
- Partner with vendors to procure troubleshoot upgrade repair and replace systems as needed
- Foster a culture of continuous engineering improvement through design and architecture review mentoring feedback and development and monitoring of key performance metrics
- Hire coach and mentor individuals; build a strong cross-functional organization
- Partner with a diverse customer base to understand requirements priorities and processes
- Propose and implement new projects or recommend system improvements
- Observe Quality standards appropriate for an FDA governed and CLIA/CAP compliant diagnostic laboratory
- Manage budgets to balance refresh of obsolete equipment and software scaling to support company growth utilizing fixed headcount and contractor/consulting resources
- Participate in a 24/7 on-call rotation
Required:
- B.S. in Computer Science or related technical field or equivalent experience
- 10 years experience with high performance computing platforms preferably organizations handling large volumes of sequenced genomic data within a commercial enterprise
- Experience with software-defined Infrastructure and cloud computing - Google Cloud Platform Amazon Web Service (AWS) etc
- GPUs and Petabyte scale Storage platforms management experience
- Design deployment support and troubleshooting experience in a complex computing environment
- HPC Engineering team management experience (either directly or in a matrixed environment)
- 4 years of networking experience with certification of CCNA or better
- 4 years of Linux/Unix system administration knowledge of Unix network protocols TCP/IP coreinfrastructuretechnologies and virtualization
- 2 years of large-scale data storage and compute clusters (HPC)infrastructure
- 2 years working in and with on-premise and cloud-based (AWS Google IBM and Azure) data-centers
- 2 years of building software release and ops processes and automation toolset
- 2 years providing documentation of system administration
Preferred:
- Proficiency with Arista and compatible networking up to and including 400 Gb/s links
- Hands-on administration of IBMs General Parallel File System
- Operational oversight of Slurm scheduler
- Working knowledge of cloud bursting technologies
- Familiarity with wide area file systems
- Practical expertise in Docker and container technologies
- Working experience with Kubernetes
- Operation of infrastructure compliant with HIPAA and SOX standards
Success Profile:
- Excels in agile high-velocity technical environments.
- Demonstrates self-leadership and a commitment to advancing both individual and team expertise.
- Combines engineering rigor with pragmatic adaptability.
- Successfully manages operational SLAs while leading initiatives critical to business growth.
Hybrid Work Model: This section is applicable to onsite employees who are eligible for hybrid work location as specified by management and related policies. Guardant has defined days for in-person/onsite collaboration and work-from-home days for individual-focused time. All U.S. employees who live within 50 miles of a Guardant facility will be required to be onsite on Mondays Tuesdays and Thursdays. We have found aligning our scheduled in-office days allows our teams to do the best work and creates the focused thinking time our innovative work requires. At Guardant our work model has created flexibility for better work-life balance while keeping teams connected to advance our science for our patients.
Employee may be required to lift routine office supplies and use office of the work is performed in a desk/office environment; however there may be exposure to high noise levels fumes and biohazard material in the laboratory to sit for extended periods of time.
Guardant Health is committed to providing reasonable accommodations in our hiring processes for candidates with disabilities long-term conditions mental health conditions or sincerely held religious beliefs. If you need support please reach out to
A background screening including criminal history is required for this role. GH will consider qualified applicants with criminal arrest or conviction histories in a manner consistent with applicable law including but not limited to the LA County Fair Chance Policies and the Fair Chance Act (Gov. Code Section 12952).
Guardant Health is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race color religion sex sexual orientation gender identity national origin or protected veteran status and will not be discriminated against on the basis of disability.
All your information will be kept confidential according to EEO guidelines.
To learn more about the information collected when you apply for a position at Guardant Health Inc. and how it is used please review ourPrivacy Notice for Job Applicants.
Please visit our career page at: Experience:
Director
About Company
Guardant Health is a leading precision oncology company focused on helping conquer cancer globally through use of its proprietary tests, vast data sets and advanced analytics. The Guardant Health oncology platform leverages capabilities to drive commercial adoption, improve patient cl ... View more