Employer Active
Role and responsibilities
Providing support to our customers
Configuring testing and deploying new GPU clusters.
Improving cluster benchmarking and monitoring
Creating maintaining and updating automation workflows.
Creating and Improving our documentation
Required skills and experience
8 years total experience
System Administration
DevOps
HPC System
8 years extensive expertise in Linux
with a thorough knowledge of various subsystems (network block etc ..) and proficiency in performance optimization
5 years advanced troubleshooting skills and mindset
5 years Ansible and Bash Scripting
3 years Slurm OR Kubernetes
2 years Proven troubleshooting experience with InfiniBand networks
1 years High Performance Parallel Filesystem Storage Systems.
Desired skills and experience
GPUs
Hypervisors
Full Time