The ML Platform team builds tools and infrastructure that powers 40 ML & AI applications including recommendations ads visual search notifications content understanding and Trust & Safety. Our team consists of
- ML Training: Training compute platform (including distributed and GPU training) PyTorch-based training environment model management & deployment.
- ML Serving: Online inference including large-scale ranking of tens of millions of requests per second. GPU acceleration. ML feature/score monitoring.
- ML Data: Feature and training dataset management. Data governance tools (ownership lineage usage tracking and monitoring) for 400 signals owned by teams across the company.
We are seeking a Senior Staff Software Engineer to help drive technical strategy across these teams. Our long-term objectives include:
- Enable advanced model architectures - Language models Multi-modal models Large embeddings Large user sequence models - increasingly large models present new challenges for training and serving
- Improve system efficiency - GPU efficiency and overall cost management often goes hand in hand with more sophisticated model architectures.
- Increase developer velocity - solve major bottlenecks in development of large-scale ML systems to speed up iterations of ML features and models.
What youll do:
- Tackle ambiguous problem areas by gathering understanding from modeling and infrastructure engineers across the company proposing and aligning on generalized solutions and driving the implementation with a team of platform engineers.
- Prototype investigate understand latest technologies from industry and academia and find opportunities to build and deploy them at our scale.
- Identify and collaborate with ML engineers to help drive forward top business-impacting ML application projects.
- Provide technical mentorship and guidance to junior engineers within the team.
What were looking for:
- In-depth experience with production ML use cases and systems at scale including with distributed systems architectures big data processing (e.g. Spark Flink) and training.
- Understanding of modern deep learning techniques performance optimizations and GPUs.
- Experience with and workflow management.
- Understanding the needs for large ML teams collaborating: governing the lifecycle and ongoing quality of features datasets models and tracking the dependencies / lineage.
- Experience in platform engineering - developing solutions for a user base of other engineers.
- Bachelors degree in Computer Science Engineering or a related field or equivalent experience.
In-Office Requirement Statement:
- We let the type of work you do guide the collaboration style. That means were not always working in an office but we continue to gather for key moments of collaboration and connection.
- This role will need to be in the office for in-person collaboration 1-2 times/quarter and therefore can be situated anywhere in the country.
Relocation Statement:
- This position is not eligible for relocation assistance. Visit our PinFlex page to learn more about our working model.
#LI-HYBRID
#LI-AH2
Required Experience:
Staff IC