drjobs Lead HPC Hardware Engineer

Lead HPC Hardware Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Dallas - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips

GResearch is a leading quantitative research and technology firm with offices in London and Dallas.

We are proud to employ some of the best people in their field and to nurture their talent in a dynamic flexible and highly stimulating culture where worldbeating ideas are cultivated and rewarded.

This is a hybrid role based in our new Dallas infrastructure hub where we work on the latest technologies in a cuttingedge environment.

The role

As GResearchs Lead HPC Hardware Engineer you will play a critical role in managing scaling and optimizing a large compute infrastructure which is composed of numerous GPUs and CPU nodes.

In this role you will work closely with Infrastructure Engineers Data Centre Operations AI Engineers Security Experts and Software Engineers to deliver a robust compute platform that supports highperformance computing needs.

Your expertise will be pivotal in ensuring that our compute infrastructure operates efficiently while also planning for its growth and maintenance.


Our approach is centred on automation hardware optimisation and infrastructure best practices. You will help drive improvements mentor junior engineers and ensure our infrastructure is both secure and scalable.

Key responsibilities of the role include:

  • Designing configuring and manage a highperformance compute infrastructure
  • Growing and optimizing our infrastructure to meet business demands
  • Ensuring the efficient operation of the OpenStackpowered environment with a primary focus on OpenStack Ironic
  • Monitoring hardware performance identifying areas for improvement and implementing solutions
  • Developing and maintaining hardware management procedures to increase server uptime and minimise failures
  • Performing diagnostics tuning and capacity planning to ensure smooth scaleout
  • Performing analysis of existing hardware lifecycle processes and providing recommendations for improvement and optimization
  • Collaborating with various teams to integrate hardware improvements aligned to organizational goals
  • Implementing best practices for security hardening of the platform and associated systems
  • Mentoring junior engineers and fostering a culture of continuous learning and improvement

Who are we looking for

The ideal candidate will have the following skills and experience:

  • Demonstrable experience managing largescale HPC infrastructure
  • Strong understanding of server hardware architecture including processors memory storage networking and power systems
  • Deep understanding of baremetal provisioning and infrastructure automation
  • Proven ability to troubleshoot hardware issues including diagnostics and repairs for both GPU and CPU nodes in production environments
  • Experience with hardware monitoring management tools and familiarity with hardware automation techniques and tools such as Ansible Puppet and Chef
  • Knowledge of Redfish API including iDRAC iLO BMC IPMI
  • Experience with hardware diagnostics optimization performance tuning and capacity planning
  • Familiarity with thermal management and optimizing data centre layout for efficiency
  • Knowledge of security best practices for hardware infrastructure
  • Strong problemsolving skills with the ability to work under pressure in a fastpaced environment
  • Excellent communication skills and the ability to work collaboratively with crossfunctional teams

The following would be beneficial:

  • Experience with large compute farms or hyperscale data centres
  • Familiarity with highperformance networking such as InfiniBand Ethernet
  • Knowledge of server configuration management and software deployment in HPC environments
  • Understanding of Linuxbased environments and proficiency in scripting languages such as Python Bash or PowerShell for automation
  • Experience with OpenStack or similar cloud platforms
  • Experience with NVIDIASMI and debugging GPUrelated issues
  • Leadership experience including team management mentoring and developing engineers

Why should you apply

  • Marketleading compensation plus annual discretionary bonus
  • Lunch provided in the office (via GrubHub)
  • Informal dress code and excellent work/life balance
  • Excellent paid time off allowance of 25 days
  • Sick days military leave and family and medical leave
  • Generous 401(k) plan
  • 16weeks fully paid parental leave
  • Medical and Prescription Dental and Vision insurance
  • Life and Accidental Death & Dismemberment (AD&D) insurance
  • Employee Assistance and Wellness programs
  • Generous relocation allowance and support
  • Great selection of office snacks and hot and cold drinks
  • Onsite gym and car parking

GResearch is committed to cultivating and preserving an inclusive work environment. We are an ideasdriven business and we place great value on diversity of experience and opinions.

We want to ensure that applicants receive a recruitment experience that enables them to perform at their best. If you have a disability or special need that requires accommodation please let us know in the relevant section

Employment Type

Full-Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.