As a Senior / Principal Inference Engineer on ML Platform you will build the next generation of ML Ecosystem Tooling specifically around model inference. ML Platform today supports billions of requests per day across our homepage marketplace economy and more. We are looking for accomplished engineers to help build out the next generation of ML platform tooling for highscale inference in a quickly innovating space.
You Will:
- Set technical strategy and oversee development of high scale reliable infrastructure systems for largescale inference especially as we scale up both inference qps and model size.
- Dig into performance bottlenecks all along the inference stack spanning from model optimizations to infrastructure optimizations.
- Stay abreast of industry trends in machine learning and infrastructure to ensure the adoption of leadingedge technologies and practices.
- Bootstrap and maintain infrastructure for ML Platform componentsServing Layer Metadata Store Model Registry and Pipeline Orchestrator.
- Partner across organizations to build tooling interfaces and visualizations that make the a delight to use.
You Have:
- 4 years of professional experience and a tool chest of system design experience upon which to draw to build scalable reliable platforms for all of Roblox.
- Experience building complex distributed systems that scale to realtime ML inference serving ideally for realtime recommendation systems serving millions of QPS.
- Experience debugging complicated infrastructurelevel performance issues to enable low latency high throughput inference..
- Bachelors degree or higher in Computer Science Computer Engineering Data Science or a similar technical field.
You Are:
- Passionate about supporting and working cross functionally with internal partners (Data Scientists and ML Engineers) to meet and understand their needs.
- A reliability nut: you love digging into tricky postmortems and identifying and fixing weaknesses in complicated systems.
- Ideally familiar with ML model inference frameworks like Triton Inference Server TensorRT KServe.
Required Experience:
Staff IC