Site Reliability Engineer – ML platform - Only W2

Saransh Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Sunnyvale, CA - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Title: Site Reliability Engineer SRE ML platform

Location: Austin TX or Sunnyvale CA

ONLY W2

Responsibilities:

  • Continuous Deployment using GitHub Actions Flux Kustomize
  • Design and implement cloud solutions build MLOps on cloud AWS
  • Data science model containerization deployment using docker VLLM Kubernetes
  • Communicate with a team of data scientists data engineers and architects document the processes
  • Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
  • Knowledge of ML models and LLM

Qualifications:

  • 6 years of experience in ML Ops with strong knowledge in Kubernetes Python MongoDB and AWS.
  • Good understanding of Apache SOLR.
  • Proficient with Linux administration.
  • Knowledge of ML models and LLM.
  • Ability to understand tools used by data scientists and experience with software development and test automation
  • Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
  • Experience working with cloud computing and database systems
  • Experience building custom integrations between cloud-based systems using APIs
  • Experience developing and maintaining ML systems built with open-source tools
  • Experience with MLOps Frameworks like Kubeflow MLFlow DataRobot Airflow etc. experience with Docker and Kubernetes
  • Experience developing containers and Kubernetes in cloud computing environments
  • Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow Airflow Argo etc.)
  • Ability to translate business needs to technical requirements
  • Strong understanding of software testing benchmarking and continuous integration
  • Exposure to machine learning methodology and best practices
  • Good communication skills and ability to work in a team
Title: Site Reliability Engineer SRE ML platform Location: Austin TX or Sunnyvale CA ONLY W2 Responsibilities: Continuous Deployment using GitHub Actions Flux Kustomize Design and implement cloud solutions build MLOps on cloud AWS Data science model containerization deployment using docker V...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting