Senior Principal Network Engineer

Graphcore

Not Interested
Bookmark
Report This Job

profile Job Location:

Austin, TX - USA

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

About us

Graphcore is one of the worlds leading innovators in Artificial Intelligence compute.
It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.

As part of the SoftBank Group Graphcore is a member of an elite family of companies responsible for some of the worlds most transformative technologies. Together they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.

Graphcores teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists silicon designers software engineers and systems architects Graphcore enjoys a culture of continuous learning and constant innovation.

Job Summary

We are seeking a Senior Principal Network Engineer to help design deploy and optimize nextgeneration AI data center networks. AI training and inference workloads require extremely high bandwidth deterministic low latency and zeropacketloss networking environments.

In this role you will partner closely with the Network Architecture Lead to design and scale highperformance computing (HPC) network fabrics supporting GPU clusters. You will work across hardware networking and AI application layers to ensure Graphcores largescale AI infrastructure operates at peak performance.

The ideal candidate brings deep experience operating hyperscale or HPC data center networks and has expertise in highspeed Ethernet fabrics RDMA technologies advanced automation and telemetry systems.

The Team

The Data Center Network Engineering team designs and operates the highperformance network fabrics that power Graphcores AI compute platforms. The team collaborates closely with hardware engineering AI researchers and infrastructure teams to build scalable networking environments optimized for distributed training and inference workloads.

Engineers work on pioneering technologies including highspeed Ethernet fabrics lossless networking RDMA transport and largescale automation frameworks to support nextgeneration AI clusters.

Responsibilities and Duties

  • Assist in defining ultrahighbandwidth nonblocking AI network fabrics (Clos spineleafsuperspine architectures) for largescale distributed AI workloads.
    Optimize performance of lossless Ethernet fabrics using congestion control mechanisms such as PFC ECN and DCQCN to support RDMA/RoCEv2 communication.
    Lead initiatives to implement NetDevOps practices and develop automation for provisioning configuration management and network remediation.
    Design and deploy highresolution telemetry pipelines to monitor network health detect microbursts and analyze congestion patterns.
    Support modeling deployment configuration and monitoring of data center network fabrics including scaleout scaleup and frontend networks.
    Collaborate crossfunctionally with hardware engineers AI researchers and data center operations teams to codesign highperformance infrastructure.
    Provide technical leadership and mentorship to network engineers while establishing best practices and operational standards.
    Contribute to the longterm networking strategy and roadmap for Graphcores AI infrastructure.
    Research and evaluate nextgeneration highspeed networking technologies and vendor solutions.

Candidate Profile

Essential

  • BS or MS or equivalent experience in Computer Science Electrical Engineering Network Engineering or related technical discipline.
    12 years of progressive network engineering experience with at least 3 years in hyperscale highdensity or HPC data center environments.
    Expertlevel knowledge of data center routing and switching protocols including BGP OSPF and EVPNVXLAN architectures.
    Strong operational understanding of RDMA networking technologies such as RoCEv2 or InfiniBand.
    Handson experience with modern merchant silicon networking platforms and NOS platforms such as Arista EOS Cisco NXOS or SONiC.
    Experience deploying highspeed network technologies including 400G/800G optics and largescale fabric architectures.
    Proficiency in automation and scripting languages such as Python Go Bash or similar tools.
    Strong collaboration and communication skills across crossfunctional engineering teams.

Desirable

  • Experience operating largescale AI or GPU clusters.
    Familiarity with network telemetry frameworks and streaming analytics.
    Experience implementing NetDevOps workflows and infrastructure automation pipelines.
    Experience influencing vendor roadmaps or evaluating nextgeneration networking technologies.


Required Experience:

Staff IC

About usGraphcore is one of the worlds leading innovators in Artificial Intelligence compute. It is developing hardware software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.As part of the ...
View more view more

About Company

Company Logo

Python, Javascript, MLOps

View Profile View Profile