Distinguished Engineer Inference Serving Network and Storage

Graphcore

Job Location:

Austin, TX - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About us

Graphcore is a globally recognized leader in Artificial Intelligence computing systems. The company designs advanced semiconductors and data center hardware that provide the specialized processing power needed to drive AI innovation while delivering the efficiency required to support its broader adoption.

As part of the SoftBank Group Graphcore is a member of an elite family of companies responsible for some of the worlds most transformative technologies.

Job Summary

We are seeking a Distinguished Engineer to lead the networking and storage architecture for a new inference serving initiative. This is a chief technologist role for the serving fabric and data path responsible for defining and driving the end-to-end strategy for networking storage observability provisioning and automation in support of large-scale AI inference services.

You will shape core technical decisions that directly influence product capability service differentiation and competitive advantage. On the networking side you will lead the design of the serving fabric inter-partition latency path management network QoS and transport tuning segmentation observability and terms of storage you will define the architecture for model artifact storage checkpoint distribution KV and session tiering and restore telemetry and log storage and backup and disaster recovery.

Storage is expected to be a critical component of inference serving at scale particularly for KV cache management state movement and service resiliency. You will therefore set technical direction across both networking and storage domains as first-class pillars of the platform.

This is a Grade 7 role for a recognized expert and thought leader who can convert strategic thinking into tangible group-level impact lead a small team and have influence across functions and external partners.

The Team

You will be in the System Engineering group and work across organizational boundaries with ML software applied AI hardware and systems inference service teams and other platform and infrastructure groups. You will also engage closely with external partners responsible for key elements of the inference service stack as well as business counterparts who depend on differentiated service capabilities reliability and scale.

This role requires strong technical leadership without relying solely on formal authority. You will be expected to align stakeholders make architectural trade-offs clear and drive execution across multiple teams while raising the technical bar for the broader organization.

Responsibilities and Duties

Define and coordinate the networking architecture for inference serving including serving fabric build inter-partition latency path optimization and management network architecture.
Lead the strategy for QoS transport tuning traffic isolation segmentation and service differentiation to support multiple inference SLAs and workload classes.
Drive the build of monitoring resource prioritization and automated management frameworks for network and storage systems at production scale.
Define the storage architecture for model artifact repositories checkpoint distribution session state telemetry and log storage backup and disaster recovery.
Lead the design of KV cache storage tiering restore and movement mechanisms as a core platform capability for large-scale inference serving.
Optimize network and storage subsystems for demanding AI and HPC workloads balancing throughput latency resiliency cost and operational simplicity.
Work with ML software and inference service teams to develop infrastructure that supports current methods for deploying large language models. Methods include disaggregated prefill/decode paths continuous batching and large-model scaling techniques.
Guide architecture for scaling models that use tensor pipeline expert and other parallelism strategies ensuring the serving infrastructure supports efficient execution and state movement.
Establish performance models benchmarks and tuning methodologies for end-to-end serving behavior including tail latency throughput stability and recovery characteristics.
Lead a small multi-functional team while providing technical direction and architectural oversight across a wider matrixed organization.
Influence roadmap standards and implementation choices across internal teams and external partners.
Act as the senior technical authority for this domain identifying risks early resolving complex trade-offs and ensuring the platform evolves in line with business and product needs.

Candidate Profile

Essentials

MS or PhD in Computer Science Computer Engineering Electrical Engineering or a related field or equivalent practical experience.
Significant industry experience typically 15 years in large-scale systems distributed infrastructure or platform architecture.
Deep expertise in networking and storage software at scale including architecture implementation configuration and performance optimization.
Proven experience designing and operating networking and storage systems for demanding applications in AI HPC or large-scale cloud environments.
Strong understanding of high-performance transport congestion and flow control QoS segmentation telemetry and production observability.
Strong understanding of distributed storage architectures artifact distribution checkpointing caching replication backup disaster recovery and operational resilience.
Demonstrated ability to architect low-latency high-throughput systems where network and storage behavior materially affect application performance.
Experience leading highly ambiguous cross-functional technical initiatives with impact across multiple teams or product areas.
Strong communication and influencing skills with the ability to align senior technical and business stakeholders.
Track record as a recognized expert who drives strategy shapes technical direction and delivers solutions beyond existing precedents.

Desirable

Familiarity with innovative LLM serving techniques and infrastructure requirements.
Experience with prefill/decode disaggregated inference continuous batching and differentiated inference services with multiple SLA and QoS tiers.
Understanding of model scaling and serving approaches involving tensor pipeline expert and related parallelism techniques.
Experience with KV cache management tiering restore and memory/storage trade-offs in inference systems.
Knowledge of modern inference serving algorithms schedulers and system-level optimization techniques.
Experience working with external technology partners suppliers or ecosystem collaborators in the delivery of complex infrastructure platforms.
Background in production-grade automation and provisioning systems for large infrastructure estates

Benefits
In addition to a competitive salaryGraphcoreoffers a competitive benefits package. We welcome people ofdifferent backgroundsand experiences;werecommitted to building an inclusive work environment that makesGraphcorea great homefor everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if yourequireany reasonable adjustments.

Required Experience:

About usGraphcore is a globally recognized leader in Artificial Intelligence computing systems. The company designs advanced semiconductors and data center hardware that provide the specialized processing power needed to drive AI innovation while delivering the efficiency required to support its bro...

About us

As part of the SoftBank Group Graphcore is a member of an elite family of companies responsible for some of the worlds most transformative technologies.

Job Summary

The Team

Responsibilities and Duties

Define and coordinate the networking architecture for inference serving including serving fabric build inter-partition latency path optimization and management network architecture.
Lead the strategy for QoS transport tuning traffic isolation segmentation and service differentiation to support multiple inference SLAs and workload classes.
Drive the build of monitoring resource prioritization and automated management frameworks for network and storage systems at production scale.
Define the storage architecture for model artifact repositories checkpoint distribution session state telemetry and log storage backup and disaster recovery.
Lead the design of KV cache storage tiering restore and movement mechanisms as a core platform capability for large-scale inference serving.
Optimize network and storage subsystems for demanding AI and HPC workloads balancing throughput latency resiliency cost and operational simplicity.
Work with ML software and inference service teams to develop infrastructure that supports current methods for deploying large language models. Methods include disaggregated prefill/decode paths continuous batching and large-model scaling techniques.
Guide architecture for scaling models that use tensor pipeline expert and other parallelism strategies ensuring the serving infrastructure supports efficient execution and state movement.
Establish performance models benchmarks and tuning methodologies for end-to-end serving behavior including tail latency throughput stability and recovery characteristics.
Lead a small multi-functional team while providing technical direction and architectural oversight across a wider matrixed organization.
Influence roadmap standards and implementation choices across internal teams and external partners.
Act as the senior technical authority for this domain identifying risks early resolving complex trade-offs and ensuring the platform evolves in line with business and product needs.

Candidate Profile

Essentials

MS or PhD in Computer Science Computer Engineering Electrical Engineering or a related field or equivalent practical experience.
Significant industry experience typically 15 years in large-scale systems distributed infrastructure or platform architecture.
Deep expertise in networking and storage software at scale including architecture implementation configuration and performance optimization.
Proven experience designing and operating networking and storage systems for demanding applications in AI HPC or large-scale cloud environments.
Strong understanding of high-performance transport congestion and flow control QoS segmentation telemetry and production observability.
Strong understanding of distributed storage architectures artifact distribution checkpointing caching replication backup disaster recovery and operational resilience.
Demonstrated ability to architect low-latency high-throughput systems where network and storage behavior materially affect application performance.
Experience leading highly ambiguous cross-functional technical initiatives with impact across multiple teams or product areas.
Strong communication and influencing skills with the ability to align senior technical and business stakeholders.
Track record as a recognized expert who drives strategy shapes technical direction and delivers solutions beyond existing precedents.

Desirable

Familiarity with innovative LLM serving techniques and infrastructure requirements.
Experience with prefill/decode disaggregated inference continuous batching and differentiated inference services with multiple SLA and QoS tiers.
Understanding of model scaling and serving approaches involving tensor pipeline expert and related parallelism techniques.
Experience with KV cache management tiering restore and memory/storage trade-offs in inference systems.
Knowledge of modern inference serving algorithms schedulers and system-level optimization techniques.
Experience working with external technology partners suppliers or ecosystem collaborators in the delivery of complex infrastructure platforms.
Background in production-grade automation and provisioning systems for large infrastructure estates

Required Experience: