Pinterest is seeking a Staff Software Engineer Capacity Engineering focused on managing and optimizing the ML infrastructure. The team is responsible for efficiently managing one of the largestscale cloudnative infrastructures in the world.
This role is highly impactful as efficiency is an ongoing strategic priority for Pinterest. The role has direct visibility across Pinterest Engineering and with Engineering and company leadership. The team is looking for a candidate with a strong background in ML Infrastructure focusing on efficiency and optimization.
What youll do
- Manage the ML hardware capacity that powers the models running at Pinterest
- Improve the efficiency of ML Infrastructure at Pinterest
- Build develop and mature profiling and optimization capabilities for ML Infrastructure at Pinterest scale
- Collaborate with ML Platform Infrastructure Engineering and SRE teams in their mission to deliver highly available resilient secure and efficient ML foundations for Pinterests tech stack
What were looking for:
- Deep understanding of GPU Architectures Pytorch etc.
- Deep understanding of supporting parts of ML software stack like Scheduling Data and Storage
- Hands on experience with shared platforms like Kubernetes
- Strong technical and performance engineering skills to collaborate with stakeholders on complex and ambiguous technical challenges
- Experience building and managing highly available distributed applications at scale
- Proficiency in software development languages such as Java Python and C
- Excellent skills in communicating complex technical issues
- Understanding of ML Models Kernels and optimization opportunities
- Handson experience with large cloudnative multitenant platforms at Internet scale
- Experience with AWS or similar cloud environments
- Deep understanding of infrastructure capacity and performance
- Bachelors degree in Computer Science Engineering or a related field or equivalent experience.
InOffice Requirement Statement:
We let the type of work you do guide the collaboration style. That means were not always working in an office but we continue to gather for key moments of collaboration and connection.
- This role will need to be in the office for inperson collaboration 12 times/quarter and therefore can be situated anywhere in the country.
Relocation Statement:
- This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.
#LIHYBRID
Required Experience:
Staff IC