Principal Engineer, Compute Platform

Pinterest

Not Interested
Bookmark
Report This Job

profile Job Location:

San Francisco, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 16 hours ago
Vacancies: 1 Vacancy

Job Summary

Pinterest serves over 600 million users through sophisticated visual and social capabilities which connect inspiration advertisement and shopping. Compute Platform provides the underlying compute capabilities to run jobs and processes for all of the systems and workloads needed behind the scenes to create the best experience for our users and advertisers. This includes distributed processing data systems search experimentation monetization AI/ML for ranking and recommendations GenAI and internal systems.

We are looking for a Principal Engineer who can lead and scale the consolidation and modernization of this infrastructure under what we call PinCompte with an emphasis on some of the largest and most challenging stateful workloads as well as GPU-heavy AI workloads. The scale and scope of the effort will require designing and building around Kubernetes and solving its scaling limitations handling stateful systems and data-intensive workloads formalizing mechanisms to stack and bin pack workloads working with multiple internal customers and giving them migration paths and working through ambiguous and unforeseen situations which arise from workload requirements production and operability requirements and unique multi-tenancy challenges.


What youll do:

  • Solving the challenges of replacing isolated pools of dedicated compute resources with a very large scale shared compute platform shifting from machine-based designs to container-based designs.
  • Working with leads across various platforms especially stateful and data platforms to build the right features and migration paths that work for them.
  • Owning and driving up utilization on the shared compute platform by designing and implementing workload stacking optimizing and bin packing safe oversubscription etc.
  • Work with multiple customers with unique requirements to make sure the platform will address their needs and is not only a viable but a desirable solution for running their workloads.
  • Leading a group of engineers around design topics execution trade offs migration paths observability performance and operability for the platform.
  • Evolving the platform towards a multi-cloud abstraction layer to enable running workloads across multiple cloud providers.
  • Being a role model for setting a high bar for production quality and engineering excellence in delivering a foundational technology which empowers the entire company.
  • Working closely with partners around capacity planning cost visibility fungibility of virtual machine instance types and efficiency.
  • Putting special focus on the delivery of GPU resources through the platform to enable and expedite AI workloads.


What were looking for:

  • Bachelors degree in Computer Science Engineering or a related field or equivalent experience.
  • 12 years of relevant industry experience with large scale production distributed systems.
  • 5 years of experience with Kubernetes in production.
  • Experience working across SWE and SRE or Production Engineering teams to deliver robust production systems.
  • Experience with running distributed data systems and migrating them to Kubernetes is highly preferred.
  • Ability to work with cross-functional partners across multiple organizations.
  • Passion automation reducing toil and building proper tooling for getting the job done.

In-Office Requirement Statement:

  • We recognize that the ideal environment for work is situational and may differ across departments. What this looks like day-to-day can vary based on the needs of each organization or role.
  • This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.


Relocation Statement:

  • This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.

#LI-REMOTE

#LI-JT1


Required Experience:

Staff IC

Pinterest serves over 600 million users through sophisticated visual and social capabilities which connect inspiration advertisement and shopping. Compute Platform provides the underlying compute capabilities to run jobs and processes for all of the systems and workloads needed behind the scenes to ...
View more view more

Key Skills

  • Design
  • Academics
  • AutoCAD 3D
  • Cafe
  • Fabrication
  • Java

About Company

Company Logo

Join the people behind the product to build a more positive internet for Pinterest users worldwide.

View Profile View Profile