drjobs Sr Site Reliability Engineer Compute SRE

Sr Site Reliability Engineer Compute SRE

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

San Mateo, CA - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

What Youll Do:

The Infra Compute SRE mission is to own and manage the successful operation of our underlying cell infrastructure system along with elements of service discovery secrets management and related software layers. Were looking for skilled Site Reliability Engineers with strong programming skills to help us build Robloxs private cloud productionize our growing Kubernetesbased infrastructure and institute reliability best practices across the Roblox Compute team.

You Will:

  • Design and Develop systems & libraries that promote faulttolerance and resilience automate much of the management and lifecycle of our clusters and ensure systems are observable.
  • Promote and Institute reliability best practices across the Infra Compute group drive common reliability initiatives provide collaborative technical reviews and operational guidance to strengthen system reliability.
  • Build Automate and Standardize process automation to create a golden path of tooling and platform support that powers the fundamental Roblox ecosystem.
  • Create Tooling that provides production guardrails by evaluating release candidate capacity with load testing tooling before deploying to production.
  • Create Performance Monitoring Services and observability towards understanding capacity issues and platform degradations monitoring production services and their changes like generalized canarying services with alerting.
  • Analyze systems and system designs for production readiness

You Have:

  • A Bachelors degree (or equivalent professional experience) in Computer Science or related engineering field with a proven track record including at least 6 years as an SRE or Software Engineer.
  • Fluency with highlevel programming languages like Go Java and C#.
  • Experience with Kubernetes or similar orchestration systems. Experience in Nomad Vault and Consul is strongly desired.
  • Experience and good habits around building software and tools and getting them adopted. Your systems focus advises a view of code needing to be deeply reliable.

You Are:

  • A Partner: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.
  • A Developer: You love building durable and reliable complex systems.
  • Passionate about problemsolving finding creative work solutions and addressing unexpected challenges as part of a team.
  • Problem Solver: You ask the right questions to tackle issues within your expertise and you use data to test your theories.
  • Planner: You have experience in large project lifecycles. You have experience working in sprints breaking down complex tasks into achievements and reporting status to keep project scheduling accurate.

Required Experience:

Senior IC

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.