Staff Software Engineer, Ads ML Inference Infrastructure

Job Location:

San Francisco, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Staff Software Engineer Ads ML Inference Infrastructure

The Ads ML Inference Infra team owns the online inference and feature serving systems that power real-time model scoring and delivery for all Ads models at Pinterest. The team is looking for a staff engineer with strong hands-on experience in large-scale ML inference systems as well as capabilities in solving ambiguous technical problems and driving strategic cross-functional efforts.

What youll do:

Lead and drive efforts to build next-generation model inference and feature serving systems that power up to 100x larger models and directly uplevel Pinterests monetization business.
Design and optimize low-latency high-throughput inference pipelines to meet strict SLOs while improving performance efficiency and cost.
Partner with Ads ML and product teams to productionize new model architectures (including LLMs and multi-stage ranking models) and scale them reliably to global traffic.
Evolve the online feature platform (feature computation caching and retrieval) to improve coverage freshness and consistency for Ads models.
Evaluate and integrate new technologies (e.g. GPU acceleration model compression Triton vLLM Dynamo) to advance our inference stack.
Build strong partnerships with other infra and ML teams to improve end-to-end reliability observability and developer velocity for Ads ML.
Mentor and coach other engineers guiding them through technical decisions system design and career development.

What were looking for:

BS (or higher) degree in Computer Science or a related field.
8 years of relevant industry experience designing and operating large-scale production ML or distributed infra systems.
Deep knowledge of at least one programming language (Java C Python).
Deep experience with distributed systems or recommendation / ads serving infrastructure (e.g. request routing online storage caching feature serving APIs).
Hands-on experience with at least one deep learning framework (PyTorch or TensorFlow) and bringing models from offline experimentation to production.
Preferred Experience with model / hardware accelerator libraries (e.g. CUDA quantization distillation low-precision inference).
Preferred Experience with inference optimization and serving frameworks such as Triton vLLM or Dynamo.
Proven track record of leading complex projects setting technical direction and collaborating across functions and orgs; experience mentoring and coaching other engineers.

In-Office Requirement Statement:

We let the type of work you do guide the collaboration style. That means were not always working in an office but we continue to gather for key moments of collaboration and connection.
This role will need to be in the office for in-person collaboration 1-2 times per week and therefore needs to be in a commutable distance from one of the following offices Palo Alto CA; San Francisco CA; Seattle WA.

Relocation Statement:

This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.

#LI-HYBRID

#LI-AG8

Required Experience:

Staff IC

Staff Software Engineer Ads ML Inference InfrastructureThe Ads ML Inference Infra team owns the online inference and feature serving systems that power real-time model scoring and delivery for all Ads models at Pinterest. The team is looking for a staff engineer with strong hands-on experience in la...

Staff Software Engineer Ads ML Inference Infrastructure

What youll do:

Lead and drive efforts to build next-generation model inference and feature serving systems that power up to 100x larger models and directly uplevel Pinterests monetization business.
Design and optimize low-latency high-throughput inference pipelines to meet strict SLOs while improving performance efficiency and cost.
Partner with Ads ML and product teams to productionize new model architectures (including LLMs and multi-stage ranking models) and scale them reliably to global traffic.
Evolve the online feature platform (feature computation caching and retrieval) to improve coverage freshness and consistency for Ads models.
Evaluate and integrate new technologies (e.g. GPU acceleration model compression Triton vLLM Dynamo) to advance our inference stack.
Build strong partnerships with other infra and ML teams to improve end-to-end reliability observability and developer velocity for Ads ML.
Mentor and coach other engineers guiding them through technical decisions system design and career development.

What were looking for:

BS (or higher) degree in Computer Science or a related field.
8 years of relevant industry experience designing and operating large-scale production ML or distributed infra systems.
Deep knowledge of at least one programming language (Java C Python).
Deep experience with distributed systems or recommendation / ads serving infrastructure (e.g. request routing online storage caching feature serving APIs).
Hands-on experience with at least one deep learning framework (PyTorch or TensorFlow) and bringing models from offline experimentation to production.
Preferred Experience with model / hardware accelerator libraries (e.g. CUDA quantization distillation low-precision inference).
Preferred Experience with inference optimization and serving frameworks such as Triton vLLM or Dynamo.
Proven track record of leading complex projects setting technical direction and collaborating across functions and orgs; experience mentoring and coaching other engineers.

In-Office Requirement Statement:

We let the type of work you do guide the collaboration style. That means were not always working in an office but we continue to gather for key moments of collaboration and connection.
This role will need to be in the office for in-person collaboration 1-2 times per week and therefore needs to be in a commutable distance from one of the following offices Palo Alto CA; San Francisco CA; Seattle WA.

Relocation Statement: