This is a remote position.
Vertex is looking for MLOps and LLMOps Engineers who can make AI systems reliable observable and scalable in production.
We are building a curated pool of senior AI engineers for upcoming roles with partner/ client companies. Selection into the pool is based on experience technical depth and demonstrated production impact.
Responsibilities
- Design and maintain ML and LLM deployment pipelines
- Build CI/CD workflows for models and inference services
- Implement monitoring for model performance drift and cost
- Optimize infrastructure for scalable training and inference
- Work with ML and product teams to support production AI
Requirements
Strong experience with cloud infrastructure and DevOps practices
- Experience deploying ML or LLM systems in production
- Familiarity with containerization and orchestration tools
- Experience with monitoring logging and reliability practices
- Production-first mindset for AI systems
Benefits
For this role its selection into the Vertex Talent pool based on experience technical depth and demonstrated production impact.
- An energised upbeat environment that strongly fosters employee work-life balance.
- A work culture that rewards goal-oriented professionals who enjoy meeting challenges head-on.
- Amazing personal growth experience
- Working with a motivated and talented team.
- More importantly an opportunity to meaningfully contribute to bringing cutting-edge Tech solutions to life
Required Skills:
CI/CD for Machine Learning (GitHub Actions GitLab CI or Jenkins) Infrastructure as Code (Terraform CloudFormation or Pulumi) Containerization & Orchestration (Docker & Kubernetes/K8s) Cloud Infrastructure Management (AWS GCP or Azure) ML & LLM Deployment Pipelines (Seldon BentoML or SageMaker) Model Observability & Monitoring (Prometheus Grafana or Arize) Drift & Cost Tracking (Performance & Token Usage Monitoring) Automated Inference Scaling (Auto-scaling & Load Balancing) Logging & Reliability Engineering (ELK Stack or Datadog) GPU/Compute Resource Optimization
Required Education:
BSc.
This is a remote position. Vertex is looking for MLOps and LLMOps Engineers who can make AI systems reliable observable and scalable in production. We are building a curated pool of senior AI engineers for upcoming roles with partner/ client companies. Selection into the pool is based on experie...
This is a remote position.
Vertex is looking for MLOps and LLMOps Engineers who can make AI systems reliable observable and scalable in production.
We are building a curated pool of senior AI engineers for upcoming roles with partner/ client companies. Selection into the pool is based on experience technical depth and demonstrated production impact.
Responsibilities
- Design and maintain ML and LLM deployment pipelines
- Build CI/CD workflows for models and inference services
- Implement monitoring for model performance drift and cost
- Optimize infrastructure for scalable training and inference
- Work with ML and product teams to support production AI
Requirements
Strong experience with cloud infrastructure and DevOps practices
- Experience deploying ML or LLM systems in production
- Familiarity with containerization and orchestration tools
- Experience with monitoring logging and reliability practices
- Production-first mindset for AI systems
Benefits
For this role its selection into the Vertex Talent pool based on experience technical depth and demonstrated production impact.
- An energised upbeat environment that strongly fosters employee work-life balance.
- A work culture that rewards goal-oriented professionals who enjoy meeting challenges head-on.
- Amazing personal growth experience
- Working with a motivated and talented team.
- More importantly an opportunity to meaningfully contribute to bringing cutting-edge Tech solutions to life
Required Skills:
CI/CD for Machine Learning (GitHub Actions GitLab CI or Jenkins) Infrastructure as Code (Terraform CloudFormation or Pulumi) Containerization & Orchestration (Docker & Kubernetes/K8s) Cloud Infrastructure Management (AWS GCP or Azure) ML & LLM Deployment Pipelines (Seldon BentoML or SageMaker) Model Observability & Monitoring (Prometheus Grafana or Arize) Drift & Cost Tracking (Performance & Token Usage Monitoring) Automated Inference Scaling (Auto-scaling & Load Balancing) Logging & Reliability Engineering (ELK Stack or Datadog) GPU/Compute Resource Optimization
Required Education:
BSc.
View more
View less