At-Scale Hardware System Validation and BKC Test Lead

Graphcore

Not Interested
Bookmark
Report This Job

profile Job Location:

Austin, TX - USA

profile Monthly Salary: Not Disclosed
Posted on: 2 days ago
Vacancies: 1 Vacancy

Job Summary

About us

Graphcoreis one of the worlds leading innovators in Artificial Intelligencecompute. It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.

As part of the SoftBank GroupGraphcoreis a member of an elite family of companies responsible for some of the worlds most transformative technologies. Together they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.

Graphcoresteams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists silicon designers softwareengineersand systems architectsGraphcoreenjoys a culture of continuous learning and constant innovation.

Job Summary

We are seeking an experienced At-Scale Hardware System Validation and BKC TestLeadto drive automation-based validation execution for hyperscale AI hardware platforms. This role focuses on delivering large-scale system validation across blade-level and rack-level AI infrastructure.

The successful candidate will define and execute automation-driven validation strategies to ensure robust hardware firmware and system integration. The roleis responsible fordelivering Best Known Configuration (BKC) for AI server platforms and ensuring readiness for hyperscale data center deployments.

The Team

The Systems Validation team ensuresGraphcoresAI hardware platforms arevalidatedat scale across blade-level rack-level and data center environments.

The team collaborates closely with silicon enablement system architecture firmware automation infrastructure rack integration and operations teams to deliver reliable and scalable AI infrastructure platforms.

Responsibilities and Duties

  • Own the end-to-end validation strategy for at-scale test execution across AI hardware platforms.
  • Ensure comprehensive validation coverage across blade-level systems and rack-level infrastructure including power cooling networking and thermal subsystems.
  • Act as the primary technical liaison with automation teams to integrate validation infrastructure and execution environments.
  • Drive validation plans that deliver qualifiedBest KnownConfiguration (BKC) for hardware and firmware solutions.
  • Define BKC validation criteria across system components including CPU GPU DDR PCIe storage networking and system management controllers.
  • Lead debug and failure analysis across internal engineering teams and ODM partners.
  • Developvalidationdashboards coverage metrics and reporting frameworks for engineering and leadership visibility.
  • Partner with architecture silicon enablement firmware rack integration and operations teams to ensure system readiness for production.
  • Support collaboration with ODM partners to ensure effectivevalidationexecution and issue resolution.

Candidate Profile

Essential

  • Bachelors orMasters degree in Electrical Engineering Computer Engineering or related discipline.
  • 15 years of experience in server hardware validation or system engineering.
  • Experience designing or implementing automation infrastructure for at-scale validation execution.
  • Proven experiencevalidatingblade-level and rack-level server platforms in hyperscale environments.
  • Experiencevalidatingintegrated HW/FW/SW server solutions across the product lifecycle.
  • Strong knowledge of high-speed interfaces such as PCIe CXL DDRNVLink and Ethernet.
  • Experience working with system firmware including UEFI BMC firmware and rack management solutions.
  • Demonstrated success leading complex hardware debug and failure analysis across cross-functional teams.

Desirable

  • Experience with ARM-based or x86 server architectures.
  • Background in rack integration validation and hyperscale data center deployments.
  • Experience building automation-driven validation frameworks and test analytics systems.
  • Strong leadership and program coordination skills across complex engineering programs.
About usGraphcoreis one of the worlds leading innovators in Artificial Intelligencecompute. It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.As part of the So...
View more view more

About Company

Company Logo

Python, Javascript, MLOps

View Profile View Profile