DevOps Engineer (AI Inference)

Gcore


Job Location:

Singapore - Singapore

Monthly Salary: Not Disclosed
Posted on: 10 hours ago
Vacancies: 1 Vacancy

Job Summary

As a DevOps Engineer you will be responsible for designing deploying and maintaining infrastructure and services that enable scalable and secure AI inference workloads on-premises.

What You Will Do

  • Design develop and maintain infrastructure for AI inference workloads including GPU scheduling model deployment pipelines and data access patterns in on-prem environments
  • Build and manage monitoring and observability tools for AI inference platforms including dashboards alerts and runbooks for model health and system performance
  • Collaborate with ML engineers and platform teams to design system architecture for AI workloads integrate inference runtimes and test performance at scale

Qualifications :

What Were Looking For

  • Strong understanding of Kubernetes architecture including CNI CSI operators ingress/gateway and control plane components.
  • Hands-on experience operating and troubleshooting production Kubernetes clusters.
  • Strong Linux and networking troubleshooting skills including DNS routing firewalling TLS MTU connectivity and performance issues.
  • Ability to develop automation and operational tooling using Python Go or Bash.
  • Experience with Terraform Ansible or similar IaC/configuration management tools.
  • Experience with VictoriaMetrics/Grafana or similar monitoring alerting and troubleshooting tools.
  • Strong experience with Git-based workflows and CI/CD pipelines.

Preferred Qualifications

  • Familiarity with Cluster API or similar Kubernetes cluster lifecycle management technologies.
  • Hands-on operation or administration of Slurm clusters.
  • Knowledge of Argo CD GitOps workflows Helm or Helmfile.
  • Background working with managed platforms PaaS or cloud services.
  • Exposure to bare metal GPU HPC or other high-performance computing environments.

Nice to Have

  • Familiarity with the NVIDIA GPU stack RDMA/InfiniBand or high-performance networking.
  • Knowledge of OpenStack or similar cloud infrastructure platforms.
  • Hands-on experience developing Kubernetes operators or controllers.

Additional Information :

Benefits 

At Gcore we want you to do your best work and enjoy the journey. Our benefits are designed to support your growth well-being and life beyond work: 

  • Competitive compensation
  • Flexible working hours and hybrid or remote options depending on your role 
  • Work from anywhere in the world for up to 45 days per year 
  • Private medical insurance for you and your family* 
  • Extra paid vacation and sick leave days* 
  • Support for lifes important moments and celebrations 
  • Language courses to help you connect and grow 
  • Modern welcoming offices with snacks drinks and entertainment* 
  • Team sports and social activities* 

*Benefits may vary depending on your location. 

Equal Opportunity Employer 

We provide equal opportunity to all applicants without regard to race color religion sex sexual orientation age gender identity gender expression national origin disability or any other legally protected characteristics. 


Remote Work :

Yes


Employment Type :

Full-time

As a DevOps Engineer you will be responsible for designing deploying and maintaining infrastructure and services that enable scalable and secure AI inference workloads on-premises.What You Will DoDesign develop and maintain infrastructure for AI inference workloads including GPU scheduling model dep...

About Company

Have you ever wondered why your favorite apps, social media content, and video games load in the blink of an eye? It's likely because of Gcore behind the scenes! Join a team that collaborates with industry giants like Intel, Dell, NVIDIA, and Equinix to accelerate AI training, provid ... View more

View Profile View Profile