drjobs Senior Software Engineer - Site Reliability ML

Senior Software Engineer - Site Reliability ML

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

San Mateo, CA - USA

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Are you a seasoned engineer with a passion for ML reliability Were looking for exceptional Software Engineers to join the Reliability team at Roblox. In this pivotal role you will drive the evolution of our ML systems ensuring they meet the highest standards of performance reliability and efficiency. Youll collaborate with crossfunctional teams to build robust ML infrastructure that supports our growth. If you have a track record of solving complex technical challenges we want to hear from you. Join us in shaping the future of our platform and delivering unparalleled value to our users.

At Roblox our vision is to achieve 1 billion daily active users. We believe this engineer will be instrumental in driving us towards that ambitious goal.

You Will:

  • Build automate and standardize process automation to create a golden path of ML tooling and platform support that powers the ML Roblox ecosystem.
  • Create tooling that provides production guardrails for developing and delivering ML training and inference services to production.
  • Create performance monitoring services and observability towards understanding ML capacity issues and platform degradations.

You Have:

  • Experience: you have a BS degree (or equivalent professional experience) in Computer Science or related engineering field with at least 6 years of experience including at least 2 years in SRE or Software Engineering.
  • Deep experience running Kubernetes clusters in production environments at large scale that are onpremise and hosted.
  • Hands on experience with Kubernetes observability maintenance and upgrades of large scale kubernetes clusters.
  • Experience running ML training and inference workloads on Kubernetes supporting MLOps frameworks like Kubeflow and working with GPUs
  • Experience working with popular machine learning frameworks such as TensorFlow or PyTorch.
  • Passion for systems: You have experience and good habits around building software and tools and getting them adopted.

You Are:

  • A Partner: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.
  • A Coder: you have experience writing common programming languages Python Go C#.
  • Selforganized: youre excited about getting in front of complex problems organizing your work by any means possible; overcome emergent issues and contributing to longrunning projects as a part of the team.
  • Problem Solver: you ask the right questions to solve issues within your expertise and you use data to test your theories.
  • Planner You have experience in large project lifecycles. You have experienced working in sprints breaking down complex tasks into milestones and reporting status to keep project scheduling accurate.

Required Experience:

Senior IC

Employment Type

Full Time

Company Industry

About Company

25 employees
Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.