HPC & Cloud Infrastructure Engineer

Encora

Not Interested
Bookmark
Report This Job

profile Job Location:

Singapore - Singapore

profile Monthly Salary: Not Disclosed
Posted on: 3 hours ago
Vacancies: 1 Vacancy

Job Summary

HPC & Cloud Infrastructure Engineer

Important Information

Location: Singapore

12 months contract

Job Summary

Were hiring an HPC & Cloud Infrastructure Engineer to design deploy and optimize high-performance computing environments across on-prem and cloud. Youll manage HPC clusters interconnects job schedulers and enable AI/ML workloads at scale while driving automation and cost efficiency

Job Description

Architect deploy and manage HPC clusters with job schedulers parallel file systems and cluster management tools

Design configure and troubleshoot Infiniband high-throughput low-latency interconnects for HPC/distributed workloads

Own PBS Professional scheduling: deployment queue optimization custom job submission scripts workload management

Administer RHEL-based systems: performance tuning package management security hardening patching via Red Hat Satellite and Ansible

Build and maintain cloud HPC environments on AWS Azure and GCP provisioning hybrid setups migrations and cost optimization

Implement Infrastructure as Code using Terraform/Ansible and integrate with CI/CD pipelines for reproducible infrastructure

Enable GPU & AI/ML workloads: containers TensorFlow PyTorch scikit-learn Keras MXNet; support MLOps pipelines for training and deployment

Optimize parallel applications using MPI and OpenMP; debug and scale distributed/shared memory workloads

Drive monitoring logging and alerting for cluster health job efficiency and resource utilization

Required Skills and Experience

High-Performance Computing

Hands on experience in managing HPC clusters with job scheduler cluster management parallel programming libraries and parallel filesystems.

Knowledge of resource scheduling and job optimization for efficient workload management

Infiniband (Networking)

Hands-on experience with high-throughput low-latency interconnect technologies like Infiniband.

Ability to design configure and troubleshoot interconnects in HPC or distributed environments.

Operating Systems and Environments

Administration and configuration of RHEL-based systems.

Performance tuning package management and security hardening.

Knowledge of Red Hat Satellite and Ansible for automation.

Job Scheduling with PBS Professional

Experience in deploying and managing PBS Professional for scheduling and workload management in HPC environments.

Customizing job submission scripts and optimizing job queues.

Parallel Programming Libraries

MPI (Message Passing Interface) and OpenMP (Open Multi-Processing):

Proficiency in writing debugging and optimizing parallelized code.

Experience with scaling applications across HPC systems.

Understanding of distributed memory (MPI) and shared memory (OpenMP)

paradigms.

Cloud Platforms

AWS Azure Google Cloud:

Expertise in provisioning configuring and managing services on all three platforms.

Cross-platform migration and hybrid cloud solutions knowledge.

Proficiency in managing high-performance computing (HPC) clusters on the cloud.

Deep understanding of cost optimization security and cloud native development tools (e.g. Kubernetes Terraform).

Infrastructure as Code (IaC)

Ability to design deploy and maintain infrastructure using automation and configuration management tools.

CI/CD pipeline integration for IaC workflows.

GPU & AI Libraries and Tools

Hands-on experience with container technologies.

Hands-on experience with TensorFlow PyTorch scikit-learn Keras or MXNet.

Familiarity with AI/ML pipelines model training and optimization.

Knowledge of MLOps tools for deploying and monitoring models

About Encora

Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services Product Engineering & Application Modernization Data & Analytics Digital Experience & Design Services DevSecOps Cybersecurity Quality Engineering AI & LLM Engineering among others.

At Encora we hire professionals based solely on their skills and do not discriminate based on age disability religion gender sexual orientation socioeconomic status or nationality.


Required Experience:

IC

HPC & Cloud Infrastructure EngineerImportant InformationLocation: Singapore12 months contractJob SummaryWere hiring an HPC & Cloud Infrastructure Engineer to design deploy and optimize high-performance computing environments across on-prem and cloud. Youll manage HPC clusters interconnects job sched...
View more view more

About Company

Company Logo

As Encora Inc. expands its footprint in Latin America, its acquisition of Nearsoft provides our clients with a unique chance to Nearshore on a global scale.

View Profile View Profile