Cloud DevOps Engineer #GeneralInternship

Singapore - Singapore

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

As an DevOps Engineer Intern for SingTels GPU Cloud you will help in implementing processes and integration of operations to advance customers AI and HPC capabilities. You will be exposed to both physical data center implementation and software solutions in a Singtel RE:AI GPU Cloud. This position requires a forward-thinking individual who thrives in dynamic environments and is committed to driving continuous improvement in GPU for AI and HPC environments. This is an excellent opportunity for someone eager to start their career in DevOps and grow their expertise in AI and HPC cloud platforms.

Responsibilities

Assist in deploying and supporting GPU clusters for AI and ML workloads.
Support automation tasks for provisioning GPU resources in on-prem and cloud platforms.
Learn and contribute to CI/CD pipeline setup for AI models and GPU-accelerated applications.
Monitor basic cluster usage health and performance under supervision.
Assist in automating infrastructure provisioning and monitoring.
Support troubleshooting of system-level issues (e.g. Slurm Kubernetes GPU drivers CUDA IB networking) with guidance from senior engineers.
Participate in system benchmarking and stay updated on advancements in GPU technologies.
Help set up monitoring and logging tools (e.g. Zabbix Prometheus NVIDIA DCGM).
Learn and apply basic security practices in a multi-tenant GPU cloud environment.
Collaborate with senior engineers and administrators to streamline workflows.
Provide user support under supervision for GPU-accelerated systems.
Work closely with senior DevOps engineers to identify bottlenecks and improve processes.
Gain hands-on learning experience in high-performance distributed computation for AI and HPC workloads.

Requirements

Currently pursuing a Bachelors degree in Computer Science/Engineering Information Technology Systems Engineering or a related field.

Basic knowledge of Linux system administration (Ubuntu CentOS Rocky Linux etc.) through coursework or personal projects.
Exposure to DevOps tools such as Jenkins Kubernetes Ansible or Terraform.
Understanding of core DevOps concepts (e.g. CI/CD automation monitoring) with willingness to learn further.
Familiarity with scripting languages (Python Bash) for simple tasks or assignments.
Exposure to monitoring solutions such as Zabbix or Prometheus is a plus.
Interest in AI frameworks such as TensorFlow or PyTorch with coursework or project experience preferred.
Awareness of cloud architectures (IaaS PaaS) and GPU technologies including NVIDIA GPUs.
Good verbal and written communication skills in English.
Collaborative mindset and ability to work effectively in a team environment.
Strong interest in developing problem-solving and analytical skills for system optimization.

Desirable qualifications

Understanding of how collective communications (MPI RDMA and NCCL) works as well as an understanding of GPU specific aceleration works on GPU cluster.
Knowledge of DevOps/ML Ops technologies in GPU cluster such as Docker/containers Kubernetes data center deployments
Understanding of AI & HPC networking technologies such as InfiniBand RoCE DPUs.

Understanding how AI and HPC workloads interact with both GPU HW and SW infrastructure.

Required Experience:

Intern

As an DevOps Engineer Intern for SingTels GPU Cloud you will help in implementing processes and integration of operations to advance customers AI and HPC capabilities. You will be exposed to both physical data center implementation and software solutions in a Singtel RE:AI GPU Cloud. This position r...

Responsibilities

Assist in deploying and supporting GPU clusters for AI and ML workloads.
Support automation tasks for provisioning GPU resources in on-prem and cloud platforms.
Learn and contribute to CI/CD pipeline setup for AI models and GPU-accelerated applications.
Monitor basic cluster usage health and performance under supervision.
Assist in automating infrastructure provisioning and monitoring.
Support troubleshooting of system-level issues (e.g. Slurm Kubernetes GPU drivers CUDA IB networking) with guidance from senior engineers.
Participate in system benchmarking and stay updated on advancements in GPU technologies.
Help set up monitoring and logging tools (e.g. Zabbix Prometheus NVIDIA DCGM).
Learn and apply basic security practices in a multi-tenant GPU cloud environment.
Collaborate with senior engineers and administrators to streamline workflows.
Provide user support under supervision for GPU-accelerated systems.
Work closely with senior DevOps engineers to identify bottlenecks and improve processes.
Gain hands-on learning experience in high-performance distributed computation for AI and HPC workloads.

Requirements

Currently pursuing a Bachelors degree in Computer Science/Engineering Information Technology Systems Engineering or a related field.

Basic knowledge of Linux system administration (Ubuntu CentOS Rocky Linux etc.) through coursework or personal projects.
Exposure to DevOps tools such as Jenkins Kubernetes Ansible or Terraform.
Understanding of core DevOps concepts (e.g. CI/CD automation monitoring) with willingness to learn further.
Familiarity with scripting languages (Python Bash) for simple tasks or assignments.
Exposure to monitoring solutions such as Zabbix or Prometheus is a plus.
Interest in AI frameworks such as TensorFlow or PyTorch with coursework or project experience preferred.
Awareness of cloud architectures (IaaS PaaS) and GPU technologies including NVIDIA GPUs.
Good verbal and written communication skills in English.
Collaborative mindset and ability to work effectively in a team environment.
Strong interest in developing problem-solving and analytical skills for system optimization.

Desirable qualifications

Understanding of how collective communications (MPI RDMA and NCCL) works as well as an understanding of GPU specific aceleration works on GPU cluster.
Knowledge of DevOps/ML Ops technologies in GPU cluster such as Docker/containers Kubernetes data center deployments
Understanding of AI & HPC networking technologies such as InfiniBand RoCE DPUs.

Understanding how AI and HPC workloads interact with both GPU HW and SW infrastructure.

Required Experience:

Intern

Key Skills

ASP.NET
Health Education
Fashion Designing
Fiber
Investigation

Apply Now

About Company

Singtel

The Singtel Group, Asia's leading communications group provides a diverse range of services including fixed, mobile, data, internet, TV, infocomms technology (ICT) and digital solutions.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click