Principal Software Engineer, Stateless Jobs Platform (Core Services)
San Mateo, CA - USA
Job Summary
Every day tens of millions of people come to Roblox to explore create play learn and connect with friends in 3D immersive digital experiences all created by our global community of developers and creators.
At Roblox were building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together from anywhere in the world and on any device. Were on a mission to connect a billion people with optimism and civility and looking for amazing talent to help us get there.
A career at Roblox means youll be working to shape the future of human interaction solving unique technical challenges at scale and helping to create safer more civil shared experiences for everyone.
The Core Services team manages the core infrastructure and API stack and builds high throughput microservices that powers . These services need to be fast reliable and highly scalable as they have a huge impact on the day-to-day experience of every Roblox addition the team owns shared libraries infrastructure microservices and the web infrastructure used by all other Roblox full-stack feature teams. We ship with testable and configurable features that allow for rapid experimentation data collection and optimize for performance and user engagement. From serving basic user information to populating content into in-experience. Core Services is integral to the Roblox experience.
We are building a massive-scale multi-region platform designed to power the next generation of global real-time experiences. At the intersection of Cloud Engineering and AI Infrastructure you will build the foundation for a platform that supports millions of concurrent users defining how stateless jobs are executed at a scale that pushes the boundaries of standard open-source tooling.
As the orchestrator for our global inference and microservices footprint our platform provides a deploy and forget experience for critical workloads. You wont just be managing clusters; you will be building the custom control plane that automates scheduling scaling and recovery across a hybrid-cloud environment ensuring our infrastructure remains resilient regardless of where it runs.
You will:
- Build the Orchestration Engine: Design and develop custom Kubernetes Operators and Controllers in Go to automate the entire lifecycle of high-throughput mission-critical stateless workloads.
- Architect Hybrid-Cloud Mobility: Create systems that enable workloads to move seamlessly between on-premise and public cloud environments ensuring high availability and automated failover during regional outages.
- Extend the Kubernetes Control Plane: Write performant reconciliation loops and Custom Resource Definitions (CRDs) to handle complex scheduling logic and resource optimization for massive CPU and GPU-intensive fleets.
- Empower Developer Velocity: Build high-level platform abstractions and automation that allow service owners to deploy global-scale code without needing to manage the underlying container orchestration.
You have:
- 10 years of experience building web services using Golang or similar language.
- Experience building and operating K8s clusters.
- Deep understanding of Kubernetes internals (control plane reconciliation loops scheduling networking).
- Experience building large scale distributed systems with focus on scalability reliability and availability. Experience building or operating control-plane or orchestration systems (e.g. schedulers workflow engines or compute platforms).
- Strong knowledge of distributed systems fundamentals such as leader election event-driven architectures messaging/queuing or distributed state management.
- Experience designing systems that handle multi-region orchestration failover disaster recovery or large-scale reliability challenges.
- Experience with Oncall and in troubleshooting live site issues. Experience leading cross team greenfield projects.
- Bachelors degree in Computer Science or a related field or equivalent experience.
- Experience writing Kubernetes Operators or custom controllers using Operator-SDK or control runtime.
Required Experience:
Staff IC
About Company
Roblox is the ultimate virtual universe that lets you create, share experiences with friends, and be anything you can imagine. Join millions of people and discover an infinite variety of immersive experiences created by a global community!