AI Hardware Systems Validation Architect

Graphcore

Not Interested
Bookmark
Report This Job

profile Job Location:

Austin, TX - USA

profile Monthly Salary: Not Disclosed
Posted on: 3 days ago
Vacancies: 1 Vacancy

Job Summary

About us

Graphcoreis one of the worlds leading innovators in Artificial Intelligencecompute. It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.

As part of the SoftBank GroupGraphcoreis a member of an elite family of companies responsible for some of the worlds most transformative technologies. Together they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.

Graphcoresteams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists silicon designers softwareengineersand systems architectsGraphcoreenjoys a culture of continuous learning and constant innovation.

Job Summary

We are seeking an experienced AI HW Systems Validation Architect to serve as the technical authority for validation of next-generation AI server and rack-scale platforms.

This role defines and drives the end-to-end validation architecture across blade-level and rack-level systems. The successful candidate will ensure comprehensivevalidationcoverage across functional electrical networking stress and thermal domains to enable reliable hyperscale AI infrastructure deployments.

The Team

Graphcoreis a globallyrecognisedleader in Artificial Intelligence computing systems. The company designs advanced semiconductors and datacentrehardware that provide thespecialisedprocessing power needed to drive AI innovation while delivering the efficiency required to support its broader adoption

The Systems Engineering and Platform Validation teamensuresGraphcoresAI compute platforms arevalidatedand production-ready for hyperscale data center environments.

The team collaborates closely with silicon enablement hardware architecture firmware system integration and operations teams tovalidatecomplex server and rack-level systems and ensure platform reliability performance and scalability.

Responsibilities and Duties

  • Own the end-to-end validationmethodologyand technical strategy for AI hardware platforms across blade-level and rack-level systems.
  • Drive validation of rack-scale platforms coveringfunctional power cooling networking fabric and system reliability.
  • Collaborate with rack validation teams tovalidatefull rack configurations power distribution cooling loop integration and system reliability.
  • Define and lead execution of comprehensive validation test plans for internal teams and ODM validation partners.
  • Ensure validation coverage aligns with architectural electrical and mechanical specifications across CPU GPU DDR PCIe storage and networking interfaces.
  • Oversee liquid cooling validation including performance leak detection and long-term reliability of cooling hardware.
  • Lead debug and issue management across cross-functional engineering teams and external partners.
  • Establishvalidationdashboards coverage metrics and quality indicators tomonitorexecution progress.
  • Partner with architecture silicon enablement firmware and operations teams to ensure robust system bring-up and production readiness.

Candidate Profile

Essential

  • Bachelors orMasters degree in Electrical Engineering Computer Engineering or related discipline.
  • 15 years of experience in server hardware validation or system engineering.
  • Proven experiencevalidatingboard blade and rack-level server hardware platforms.
  • Strong knowledge of high-speed interfaces such as PCIe CXL DDRNVLink and Ethernet.
  • Experience developing validation methodologies and large-scale validation test plans.
  • Experience leadingdebugand failure analysis across complex systems.
  • Experience managing ODM validation programs including test planning and issue tracking.
  • Familiarity with liquid cooling validation and system-level thermal reliability.

Desirable

  • Experience with ARM-based or x86 server architectures.
  • Background in rack integration testing and hyperscale deployment readiness.
  • Experience with automated validation frameworks and test data analytics.
  • Strong program leadership and cross-functional collaboration skills.

Required Experience:

Staff IC

About usGraphcoreis one of the worlds leading innovators in Artificial Intelligencecompute. It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.As part of the So...
View more view more

About Company

Company Logo

Python, Javascript, MLOps

View Profile View Profile