Hello
My name is Shubham Pal and I am a Staffing Specialist at Sapear Inc. I am reaching out to you on an exciting job opportunity with one of our clients.
Title: Site Reliability Engineer SRE ML platform
Location: Austin TX OR Sunnyvale CA
Type: FTE/ FTC
Responsibilities:
- Continuous Deployment using GitHub Actions Flux Kustomize
- Design and implement cloud solutions build MLOps on cloud AWS
- Data science model containerization deployment using docker VLLM Kubernetes
- Communicate with a team of data scientists data engineers and architects document the processes
- Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
- Knowledge of ML models and LLM
Qualifications:
- 6 years of experience in ML Ops with strong knowledge in Kubernetes Python MongoDB and AWS.
- Good understanding of Apache SOLR.
- Proficient with Linux administration.
- Knowledge of ML models and LLM.
- Ability to understand tools used by data scientists and experience with software development and test automation
- Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
- Experience working with cloud computing and database systems
- Experience building custom integrations between cloud-based systems using APIs
- Experience developing and maintaining ML systems built with open-source tools
- Experience with MLOps Frameworks like Kubeflow MLFlow DataRobot Airflow etc. experience with Docker and Kubernetes
- Experience developing containers and Kubernetes in cloud computing environments
- Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow Airflow Argo etc.)
- Ability to translate business needs to technical requirements
- Strong understanding of software testing benchmarking and continuous integration
- Exposure to machine learning methodology and best practices
- Good communication skills and ability to work in a team
Note: Focus is to have 60% SRE and 40% ML Ops
Skill Area | Includes | Weight (%) |
Platform Reliability & Containerization | Kubernetes Docker Microservices Linux | 30% |
MLOps & AWS Cloud | Model deployment versioning monitoring AWS (SageMaker S3 Lambda EKS) | 25% |
CI/CD & GitOps | GitHub Actions Flux | 15% |
Monitoring & Observability | Splunk Grafana Prometheus performance tracking | 15% |
Integration & Collaboration | Python scripting API integrations Apache Solr LLM awareness teamwork with data scientists & engineers | 15% |
Regards !!
Shubham Pal
Lead Business Development Manager
Sapear Inc.
Email :
Cell : 1
We are hiring: