AI HW Systems Engineering and Debug Lead
Austin, TX - USA
Job Summary
About us
Graphcoreis one of the worlds leading innovators in Artificial Intelligencecompute. It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.
As part of the SoftBank GroupGraphcoreis a member of an elite family of companies responsible for some of the worlds most transformative technologies. Together they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.
Graphcoresteams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists silicon designers softwareengineersand systems architectsGraphcoreenjoys a culture of continuous learning and constant innovation.
Job Summary
We are seeking an experienced AI HW Systems Engineering and Debug Lead to drive system-level debug and bring-up activities forGraphcoresnext-generation AI data center platforms.
The successful candidate will leadcomplexdebug efforts across hardware firmware and software layers for blade and rack-level systems. This role focuses on developing scalable debug strategies improving debug throughput and ensuringtimelyresolution of system-level issues throughout the product lifecycle.
The Team
Graphcoreis a globallyrecognisedleader in Artificial Intelligence computing systems. The company designs advanced semiconductors and datacentrehardware that provide thespecialisedprocessing power needed to drive AI innovation while delivering the efficiency required to support its broader adoption
The Systems Engineering and Validation teamensuresGraphcoresAI compute platforms are fullyvalidated debugged and ready for deployment in hyperscale data center environments.
The team collaborates closely with silicon engineering system architecture firmware operating system and rack integration teams toidentifyand resolve system-level issues and drive improvements in validation and debug methodologies.
Responsibilities and Duties
- Own and develop AI systems debugmethodologyand system bring-up strategies for next-generation AI data center platforms.
- Lead system-level debug and root cause analysis for issuesidentifiedduring server rack validation post-silicon validation and production phases.
- Drive complex debug efforts across silicon hardware platforms firmware operating systems and software stacks.
- Manage and track technical issues risks and priorities to ensure program milestones are achieved.
- Publish debug program indicators and metrics toidentifyroadblocks and improve debug throughput.
- Coordinate cross-functional teams including system architecture silicon firmware and validation teams to resolve system-level issues.
- Lead development and integration of debug tools scripts and methodologies to improve debug efficiency.
- Communicate program status risks and technical findings to engineering leadership and stakeholders.
Candidate Profile
Essential
- Bachelors orMasters degree in Electrical Engineering Computer Engineering or related discipline.
- 15 years of experience working on complex systems engineering challenges involving HW/FW/SW debug in server or data center environments.
- Proven experience leading validation and debug for board blade and rack-level hardware platforms.
- Strong experience debugging OS firmware silicon and hardware issues.
- Understanding ofindustry-standardsystem buses such as PCIe and CXL and their software stacks.
- Strong knowledge of ARM or x86 CPU architectures SoC design memory systems and power management.
- Experience with system architecture validation strategies and complex system debug methodologies.
- Strong collaboration communication and cross-team coordination skills.
Desirable
- Experience designing or deploying AI/ML rack-scale systems.
- Experience developing at-scale debug methodologies for hyperscale data center systems.
- Familiarity with data center infrastructure and emerging AI hardware technologies.
- Experience with rack integration testing and hyperscale deployment readiness.
- Knowledge of automated validation frameworks test analytics and continuous validation practices.
Required Experience:
IC