Principal HPC Network Engineer (remote in the EU)
Job Summary
Role Overview:
We are seeking a highly skilled Senior HPC Networking Engineer to design deploy manage and troubleshoot high-performance networking environments. The ideal candidate will have deep expertise in InfiniBand technologies strong general networking knowledge and hands-on experience with Fortinet solutions. You will play a critical role in ensuring the performance reliability and scalability of HPC infrastructure.
Key Responsibilities:
Design deploy and maintain high-performance network infrastructures for HPC environments with a strong focus on InfiniBand fabrics.
Troubleshoot complex network issues across InfiniBand and Ethernet environments ensuring minimal downtime and optimal performance.
Manage and optimize InfiniBand components including switches HCAs subnet managers and fabric configurations.
Perform performance tuning monitoring and capacity planning for HPC networking systems.
Implement and maintain network security using Fortinet solutions (FortiGate FortiManager FortiAnalyzer).
Diagnose and resolve issues related to routing switching latency and throughput across hybrid network environments.
Collaborate with compute storage and platform teams to support HPC workloads and cluster operations.
Develop and maintain documentation for network architecture configurations and operational procedures.
Participate in on-call rotations and provide escalation support for critical incidents.
Lead or contribute to network upgrades migrations and new deployments.
Qualifications :
Required:
5 years of experience in network engineering with a focus on HPC or data center environments.
Strong hands-on experience with InfiniBand technologies (e.g. Mellanox/NVIDIA).
Solid understanding of networking fundamentals: TCP/IP routing protocols (BGP OSPF) VLANs QoS and network design.
Proven experience deploying and troubleshooting Fortinet solutions (FortiGate FortiManager VPNs firewall policies).
Experience with network performance analysis and troubleshooting tools.
Familiarity with Linux systems and scripting for automation (e.g. Bash Python).
Strong analytical and problem-solving skills.
Preferred:
Experience with large-scale HPC clusters or AI/ML infrastructure.
Knowledge of RDMA MPI and low-latency networking concepts.
Certifications such as FCSS/FCNSP (Fortinet) CCNP/CCIE or equivalent.
Experience with automation and Infrastructure as Code tools (e.g. Ansible Terraform).
Soft Skills:
Strong communication and collaboration skills.
Ability to work independently and handle complex technical challenges.
Detail-oriented with a proactive approach to problem-solving.
Additional Information :
We offer:
- Operate some of the most advanced AI infrastructure environments in production today.
- Work with the latest NVIDIA GPU technologies Kubernetes platforms and high-performance networking environments.
- Help define operational standards and reliability practices for next-generation AI infrastructure services.
- Influence the adoption of AI-powered operational capabilities through k0rdent AI.
- Work alongside highly skilled engineers solving complex infrastructure and platform challenges at scale.
- Join a growing organisation investing heavily in AI infrastructure platform services and operational innovation.
#Remote
We are a Leader for Container Management in G2 (#2 after AWS)!
We are a Leader for Container Management in G2 (#2 after AWS)!
Remote Work :
Yes
Employment Type :
Full-time
About Company
Mirantis is an open cloud company that helps organizations achieve digital self determination by giving them complete control over their strategic infrastructure. The company combines intelligent automation and cloud-native expertise for managing and operating virtual machines, contai ... View more