drjobs Senior Devops Engineer- ML Engineering Support

Senior Devops Engineer- ML Engineering Support

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Bengaluru - India

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Teamwork makes the stream work.

Roku is changing how the world watches TV

Roku is the #1 TV streaming platform in the U.S. Canada and Mexico and weve set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love enable content publishers to build and monetize large audiences and provide advertisers unique capabilities to engage consumers.

From your first day at Roku youll make a valuable - and valued - contribution. Were a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines.

About the Role

We are seeking a talented and experienced Senior Software Engineer DevOps/SRE to join our dynamic team and play a critical role in supporting Machine Learning Engineering activities. The ideal candidate will have a strong background in DevOps practices cloud infrastructure management automation and MLOps tooling along with team leadership skills.

If you have a proven track record architecting and scaling ML/AI platforms enjoy solving intriguing system challenges at internet-scale are innovative at heart and thrive in building infrastructure that accelerates ML experimentation and deployment this role might be a great fit for you!

What Youll Be Doing

  • Provide technical leadership and guidance to DevOps/SRE engineers supporting ML Engineering initiatives; mentor team members in best practices technologies and methodologies.
  • Design implement and maintain scalable and resilient cloud infrastructure (AWS & GCP) optimized for ML workloads including GPU/TPU orchestration and distributed training.
  • Partner with ML Engineers to streamline the end-to-end ML lifecycle: data ingestion feature engineering training evaluation deployment and monitoring.
  • Build and maintain CI/CD pipelines for ML applications and models using GitHub Actions GitLab CI/CD Argo or Tekton.
  • Integrate with MLOps platforms (e.g. MLflow Kubeflow Airflow SageMaker Vertex AI) to ensure reproducibility and traceability of experiments.
  • Lead incident response efforts for ML-serving and training infrastructure minimizing downtime and ensuring high availability.
  • Implement observability practices for ML workloads including model performance monitoring drift detection and metrics via Prometheus Grafana and Datadog.
  • Collaborate with security and compliance teams to ensure adherence to data governance PCI SOX and AI/ML data security standards.
  • Optimize system resources for large-scale ML jobs including auto-scaling GPU clusters cost optimization and quota management.
  • Drive continuous improvement across DevOps MLOps processes; proactively identify areas for enhancement.
  • Maintain clear documentation and foster a culture of knowledge sharing across DevOps ML and Data Engineering teams.
  • Participate in 24x7 on-call rotation with availability to work with global teams in the event of critical outages.

Were Excited if You Have

  • 8 years of experience in DevOps/SRE roles including at least 23 years supporting ML or data-intensive workloads.
  • Strong programming skills in Python or Go; experience building internal tools and automation for ML pipelines.
  • Hands-on experience with Kubernetes Docker ECS/EKS/GKE and service mesh tools such as Istio or Envoy.
  • Familiarity with GPU/accelerator orchestration (NVIDIA GPU Operator KubeFlow Slurm Ray or similar).
  • Experience with Infrastructure as Code (IaC): Terraform Helm Ansible or CloudFormation.
  • Deep understanding of distributed systems microservices architecture and cloud-native design patterns.
  • Exposure to MLOps tools: MLflow Kubeflow Pipelines Airflow Argo Vertex AI or SageMaker.
  • Strong proficiency in cloud platforms (AWS and GCP required; Azure a plus).
  • Knowledge of data engineering concepts (object storage like S3/GCS parquet/ORC data versioning with DVC or Delta Lake).
  • Experience with networking security and compliance (role-based access VPC design encryption auditing).
  • Demonstrated success in cross-functional collaboration with ML Data and Product teams.
  • Preferred certifications: Certified Kubernetes Administrator (CKA) AWS Certified DevOps Engineer Google Professional Cloud DevOps Engineer NVIDIA Deep Learning Institute courses.
  • AIliteracy and curiosity You have either tried Gen AI in your previous work or outside of work or are curious about Gen AI and have explored it.
  • BS Degree in Computer Science or equivalent experience.

Benefits

Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical dental and vision) life accident disability commuter and retirement options (401(k)/pension). Our employees can take time off work for vacation and other personal reasons to balance their evolving work and life needs. Its important to note that not every benefit is available in all locations or for every role. For details specific to your location please consult with your recruiter.

The Roku Culture

Roku is a great place for people who want to work in a fast-paced environment where everyone is focused on the companys success rather than their own. We try to surround ourselves with people who are great at their jobs who are easy to work with and who keep their egos in check. We appreciate a sense of humor. We believe a fewer number of very talented folks can do more for less cost than a larger number of less talented teams. Were independent thinkers with big ideas who act boldly move fast and accomplish extraordinary things through collaboration and short at Roku youll be part of a company thats changing how the world watches TV.

We have a unique culture that we are proud of. We think of ourselves primarily as problem-solvers which itself is a two-part idea. We come up with the solution but the solution isnt real until it is built and delivered to the customer. That penchant for action gives us a pragmatic approach to innovation one that has served us well since 2002.

To learn more about Roku our global footprint and how weve grown visit providing your information you acknowledge that you want Roku to contact you about job roles that you have read Rokus Applicant Privacy Notice and understand that Roku will use your information as described in that notice. If you do not wish to receive any communications from Roku regarding this role or similar roles in the future you may unsubscribe here at any time.


Required Experience:

Senior IC

Employment Type

Full Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.