Distinguished Engineer AI

San Jose, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Distinguished Engineer - AI Infrastructure

We are seeking a Distinguished Engineer with unrivaled depth in AI/ML inferencing at scale and the distributed systems foundations that power it. You will architect and ship our next-generation AI infrastructure and inferencing platform serving millions of requests with uncompromising latency throughput and reliability requirements.

You set the technical North Startranslating high-stakes business problems into elegant defensible architectures that teams rally behind. You drive consensus through technical authority shaping roadmaps where your architectural decisions become company strategy. You invent solutions where standard approaches fail and turn constraints into lasting competitive moats.

Core Expertise Required:

AI Inferencing & ML Systems: Deep hands-on experience with high-performance inference engines (TensorRT vLLM ONNX Runtime Triton) model optimization (quantization pruning distillation) and serving patterns for LLMs and computer vision models at scale. Proven track record building architecting RAG pipelines and optimizing retrieval-augmented generation workflows for production latency targets.
Distributed Systems at Scale: 15 years architecting fault-tolerant low-latency distributed systems. Expert-level understanding of consensus protocols distributed state management and data consistency models under partition. Experience with high-performance filesystems and storage engines optimized for AI workloads (checkpoints model artifacts training datasets).
AI Infrastructure & Platform Engineering: Built enterprise-grade or SaaS platforms specifically designed for AI/ML workloadsmodel registries feature stores inference gateways and multi-tenant serving infrastructure. Deep familiarity with GPU/TPU cluster orchestration memory hierarchy optimization and heterogeneous compute scheduling.
High-Performance Data Planes: Designed and implemented high-throughput low-latency networking stacks for critical data path operations. Expertise in RDMA DPDK kernel bypass techniques and custom protocols for inter-service and accelerator-to-accelerator communication.
Security & Multi-Tenancy: Hardened multi-tenant ML infrastructure with robust isolation end-to-end encryption key management for model weights and fine-grained RBAC/ABAC for data scientists and production workloads.
Cloud-Native Orchestration: Expert in Kubernetes scheduling extensions (device plugins custom controllers) service mesh for AI microservices and API gateway patterns for model serving.

Job Requirements

About the team

ONTAP is NetApps flagship storage operating system. The ONTAP team drives the product strategy roadmap and engineering delivery for ONTAP software and systems. You are responsible for developing innovative solutions and architecture for ONTAP software and systems spanning the areas of filesystems and storage security networking and protocols. The solutions you architect and design will drive mission critical applications AI infrastructure and cloud workflows for Fortune 500 companies.

What will you do:

Provide the technology strategy to accelerate the pace of innovation within NetApp and the Industry.
Define the roadmap and long-term vision derived from key business priorities technology and competitive trends as well as new and emerging customer use cases.
Partner with Product management and engineering to define and deliver next generation products for NetApp.
Influence executive management and engineering to contribute towards a competitive portfolio. Demonstrate influence and act as a force multiplier across engineers at NetApp.
Mentor senior and principal engineers and be the technical bar-raiser for senior and principal technical roles.
Evangelize both internally and externally for NetApp products you own and become an industry recognized authority on related technologies and domain.

What will you bring:

Globally recognized as domain expert in software and system design for highly scalable distributed storage and databases control and data plane architectures required to fuel large scale infrastructure for serving Gen AI and AI as a Service workloads.
Experience building highly resilient and scalable enterprise grade products.
Experience in building large scale compute intensive stateful applications.
Ownership for Product and System architecture for multiple significant projects
Expertise in coding design architecture subsystems and technology trends
Excellent communication skills to communicate with executives senior leadership on products technology trends and customer issues.

Compensation:
The target salary range for this position is 255850 - 380600 USD. The salary offered will be determined by the candidates location qualifications experience and education and may be outside of this range. Final compensation packages are competitive and in line with industry standards reflecting a variety of factors and include a comprehensive benefits package. This may cover Health Insurance Life Insurance Retirement or Pension Plans Paid Time Off various Leave options Performance-Based Incentives employee stock purchase plan and/or restricted stocks (RSUs) with all offerings subject to regional variations and governed by local laws regulations and company policies. Benefits may vary by country and region and further details will be provided as part of the recruitment process.

Required Experience:

Job Summary Distinguished Engineer - AI InfrastructureWe are seeking a Distinguished Engineer with unrivaled depth in AI/ML inferencing at scale and the distributed systems foundations that power it. You will architect and ship our next-generation AI infrastructure and inferencing platform serving m...

Job Summary

Distinguished Engineer - AI Infrastructure

Core Expertise Required:

AI Inferencing & ML Systems: Deep hands-on experience with high-performance inference engines (TensorRT vLLM ONNX Runtime Triton) model optimization (quantization pruning distillation) and serving patterns for LLMs and computer vision models at scale. Proven track record building architecting RAG pipelines and optimizing retrieval-augmented generation workflows for production latency targets.
Distributed Systems at Scale: 15 years architecting fault-tolerant low-latency distributed systems. Expert-level understanding of consensus protocols distributed state management and data consistency models under partition. Experience with high-performance filesystems and storage engines optimized for AI workloads (checkpoints model artifacts training datasets).
AI Infrastructure & Platform Engineering: Built enterprise-grade or SaaS platforms specifically designed for AI/ML workloadsmodel registries feature stores inference gateways and multi-tenant serving infrastructure. Deep familiarity with GPU/TPU cluster orchestration memory hierarchy optimization and heterogeneous compute scheduling.
High-Performance Data Planes: Designed and implemented high-throughput low-latency networking stacks for critical data path operations. Expertise in RDMA DPDK kernel bypass techniques and custom protocols for inter-service and accelerator-to-accelerator communication.
Security & Multi-Tenancy: Hardened multi-tenant ML infrastructure with robust isolation end-to-end encryption key management for model weights and fine-grained RBAC/ABAC for data scientists and production workloads.
Cloud-Native Orchestration: Expert in Kubernetes scheduling extensions (device plugins custom controllers) service mesh for AI microservices and API gateway patterns for model serving.

Job Requirements

About the team

What will you do:

Provide the technology strategy to accelerate the pace of innovation within NetApp and the Industry.
Define the roadmap and long-term vision derived from key business priorities technology and competitive trends as well as new and emerging customer use cases.
Partner with Product management and engineering to define and deliver next generation products for NetApp.
Influence executive management and engineering to contribute towards a competitive portfolio. Demonstrate influence and act as a force multiplier across engineers at NetApp.
Mentor senior and principal engineers and be the technical bar-raiser for senior and principal technical roles.
Evangelize both internally and externally for NetApp products you own and become an industry recognized authority on related technologies and domain.

What will you bring:

Globally recognized as domain expert in software and system design for highly scalable distributed storage and databases control and data plane architectures required to fuel large scale infrastructure for serving Gen AI and AI as a Service workloads.
Experience building highly resilient and scalable enterprise grade products.
Experience in building large scale compute intensive stateful applications.
Ownership for Product and System architecture for multiple significant projects
Expertise in coding design architecture subsystems and technology trends
Excellent communication skills to communicate with executives senior leadership on products technology trends and customer issues.

Required Experience:

Key Skills

Apply Now

About Company

NetApp

At NetApp, our top priority is the health and safety of our event attendees and employees, including every community around the world being impacted by COVID-19. As a result, we have decided to reimagine our annual NetApp INSIGHT Paris and Berlin events to be fully digital. We’re als ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click