Principal Software Engineer, Compute Provisioning

Roblox

Not Interested
Bookmark
Report This Job

profile Job Location:

San Mateo, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 3 days ago
Vacancies: 1 Vacancy

Job Summary

Every day tens of millions of people come to Roblox to explore create play learn and connect with friends in 3D immersive digital experiences all created by our global community of developers and creators.

At Roblox were building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together from anywhere in the world and on any device. Were on a mission to connect a billion people with optimism and civility and looking for amazing talent to help us get there.

A career at Roblox means youll be working to shape the future of human interaction solving unique technical challenges at scale and helping to create safer more civil shared experiences for everyone.


As a Principal Software Engineer on the Fleet Management team you will lead the systems that provision and rebuild Robloxs global fleet across bare metal and team owns provisioning and MAPI the global Machine API that turns raw capacity into production-ready infrastructure in minutes across hundreds of thousands of machines on-prem and cloud environments including new GPU and new AI will shape the technical direction for this critical compute platform and unify diverse hardware and environment-specific workflows behind MAPI and drive large-scale maintenance operations like firmware updates and hardware tuning.

You will:

  • Lead the Machine Bootstrap pod in building and evolving provisioning and fleet management at massive scale.
  • Architect and extend MAPI the unified Machine API that abstracts bare-metal GPU hosts and cloud instances behind a single global interface.
  • Ship fleet-wide maintenance operations (BIOS updates firmware updates configuration changes) to hundreds of thousands of machines through MAPI.
  • Drive best-in-class provisioning performance minutes to fully rebuild a machine from scratch.
  • Evaluate and integrate new hardware platforms including GPU servers and AI accelerators into the provisioning pipeline.
  • Collaborate across Compute Networking and Cloud teams on the full machine lifecycle from rack-and-stack to production.

You have:

  • 8 years of experience with strong expertise in distributed systems and infrastructure.
  • Bachelors degree in computer science or equivalent field
  • Strong proficiency in Go C/C Rust or other system level programming languages.
  • Experience building and operating large-scale distributed systems that other engineering teams depend on.
  • Familiarity with bare-metal concepts (PXE/iPXE DHCP BMC/IPMI/Redfish OS imaging) is a plus; deep low-level systems experience is a bonus not a requirement.
  • Interest in modern server hardware including GPU servers AI accelerators and cloud infrastructure.
  • A track record of building high-performance automation at fleet scale and reducing toil through developer-friendly APIs.

Required Experience:

Staff IC

Every day tens of millions of people come to Roblox to explore create play learn and connect with friends in 3D immersive digital experiences all created by our global community of developers and creators.At Roblox were building the tools and platform that empower our community to bring any experien...
View more view more

About Company

Company Logo

Roblox is the ultimate virtual universe that lets you create, share experiences with friends, and be anything you can imagine. Join millions of people and discover an infinite variety of immersive experiences created by a global community!

View Profile View Profile