Reliability Engineer

Graphcore

Not Interested
Bookmark
Report This Job

profile Job Location:

Taipei City - Taiwan

profile Monthly Salary: Not Disclosed
Posted on: 5 days ago
Vacancies: 1 Vacancy

Job Summary

About Graphcore

At Graphcore were building the future of AI a team of semiconductor software and AI experts with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter part of the SoftBank Group backed by significant long-term investment we are delivering key technology into the fast-growing SoftBank AI meet the vast and exciting AI opportunity Graphcore is expanding its teams around the are bringing together the brightest minds to solve the toughest problems in a place where everyone has the opportunity to make an impact on the company our products and the future of artificial intelligence.

Job Summary

Responsible for system-level reliability of AI servers with liquid cooling and HVDC architectures owning reliability validation shock & vibration robustness and failure analysis from board to rack level to ensure safe transport deployment and long-term datacenter operation.

Key Responsibilities and skills

  • Plan and execute reliability validation across board server and rack levels.
  • Define and run environmental accelerated and mechanical tests including thermal/power cycling humidity corrosion shock & vibration and HALT/HASS.
  • Lead shock & vibration validation for transportation handling seismic and operational conditions.
  • Assess reliability risks for liquid cooling systems (leakage fatigue pump life corrosion coolant stability).
  • Evaluate HVDC mechanical and electrical robustness (busbars connectors power interfaces).
  • Perform reliability prediction and life data analysis (Weibull MTBF).
  • Lead cross-functional design reviews and drive risk mitigation.
  • Conduct failure analysis and RCA using standard FA methodologies.
  • Define andmaintainreliability and S&V test specifications (JEDEC Telcordia GR-63 JESD22 MIL-STD-810 ISTA ASHRAE UL IEC).
  • ImplementOn-going Reliability Test (ORT) for production quality.
  • Document results and support customer audits and certifications.

Qualifications

  • Bachelors orMasters degree in Mechanical Electrical Reliability Materials or related Engineering.
  • 10 years of reliability engineering experience in AI servers datacenter systems HPC or complex electronics.
  • Hands-on experience with environmental shock and vibration testing.
  • Strong knowledge of reliability methodologies and statistical analysis.
  • Practical experience with liquid cooling and HVDC systems.
  • Proven failure analysis and RCA capability.
  • Strong communicationskills in English; Mandarin a plus.

Preferred Experience

  • AI server architecture and large-scale liquid cooling systems.
  • FEA/modal analysis and test correlation.
  • Datacenter telecom and transportation standards knowledge.
  • Reliability certification (e.g. ASQ CRE).

Benefits

In addition to a competitive salary Graphcore offers a competitive benefits package. We welcome people of different backgrounds and experiences; were committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.


Required Experience:

IC

About GraphcoreAt Graphcore were building the future of AI a team of semiconductor software and AI experts with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter part of the SoftBank Group backed by significant long-term investmen...
View more view more

About Company

Company Logo

Python, Javascript, MLOps

View Profile View Profile