REQUIREMENTS:
- Experience : 7.5 Years
- 10-12 years in infrastructure platform DevOps or MLOps roles
- Strong experience with cloud platforms (AWS/GCP/Azure) and Kubernetes
- Hands-on experience deploying and operating LLMs (OpenAI Anthropic open-source models)
- Proficiency with GPU infrastructure model serving frameworks and vector databases
- Strong programming skills in Python; experience with Bash/Go is a plus
- Experience with monitoring logging and performance tuning for distributed systems
- Preferred Qualifications
- Experience with LLM fine-tuning RAG pipelines and prompt/version management
- Familiarity with tools like Terraform Helm Argo Ray or similar
- Exposure to cost optimization strategies for large-scale AI systems
Responsibilities:
- Design and manage scalable infrastructure for training fine-tuning serving and monitoring LLMs
- Build and maintain LLMOps pipelines (deployment versioning rollback monitoring evaluation)
- Optimize inference performance (latency throughput cost) across GPU/accelerator stacks
- Implement CI/CD IaC and automation for AI/ML workloads
- Ensure observability reliability and governance of LLM systems in production
- Collaborate with ML platform and product teams to operationalize AI solutions
- Manage security compliance and access control for model and data pipelines
Qualifications :
Bachelors or masters degree in computer science Information Technology or a related field.
Remote Work :
No
Employment Type :
Full-time
REQUIREMENTS:Experience : 7.5 Years10-12 years in infrastructure platform DevOps or MLOps rolesStrong experience with cloud platforms (AWS/GCP/Azure) and KubernetesHands-on experience deploying and operating LLMs (OpenAI Anthropic open-source models)Proficiency with GPU infrastructure model serving ...
REQUIREMENTS:
- Experience : 7.5 Years
- 10-12 years in infrastructure platform DevOps or MLOps roles
- Strong experience with cloud platforms (AWS/GCP/Azure) and Kubernetes
- Hands-on experience deploying and operating LLMs (OpenAI Anthropic open-source models)
- Proficiency with GPU infrastructure model serving frameworks and vector databases
- Strong programming skills in Python; experience with Bash/Go is a plus
- Experience with monitoring logging and performance tuning for distributed systems
- Preferred Qualifications
- Experience with LLM fine-tuning RAG pipelines and prompt/version management
- Familiarity with tools like Terraform Helm Argo Ray or similar
- Exposure to cost optimization strategies for large-scale AI systems
Responsibilities:
- Design and manage scalable infrastructure for training fine-tuning serving and monitoring LLMs
- Build and maintain LLMOps pipelines (deployment versioning rollback monitoring evaluation)
- Optimize inference performance (latency throughput cost) across GPU/accelerator stacks
- Implement CI/CD IaC and automation for AI/ML workloads
- Ensure observability reliability and governance of LLM systems in production
- Collaborate with ML platform and product teams to operationalize AI solutions
- Manage security compliance and access control for model and data pipelines
Qualifications :
Bachelors or masters degree in computer science Information Technology or a related field.
Remote Work :
No
Employment Type :
Full-time
View more
View less