In this role your daily impact spans the entire spectrum of systems engineering. One hour you might be performing routine lifecycle maintenancepatching a fleet of RHEL workstations or managing user identities across a heterogeneous domainto ensure the baseline stability of our enterprise. The next you are diving into the high-performance fabric debugging a latency spike on an InfiniBand card or fine-tuning a Slurm scheduler to prioritize a mission-critical simulation.
You arent just managing boxes; you are the bridge between raw silicon and national security breakthroughs. Whether its the methodical hardening of a standard server build to meet SAP requirements or the high-adrenaline optimization of a multi-petabyte Lustre filesystem your work ensures that our researchers never have to wait on the infrastructure to catch up with their imagination. This position is 100% on-site.
Responsibilities
- Architect & Deploy: Lead the design and lifecycle management of mission-critical Linux workstations enterprise-grade servers and high-performance computing (HPC) clusters.
- Engineer Filesystems: Master the art of data movement. Administer complex local and distributed filesystems (Lustre GPFS/Spectrum Scale) to ensure extreme-speed access across the fabric.
- Infrastructure as Code (IaC): Treat the data center as a codebase. Develop sophisticated automation workflows using Python Bash and Ansible to eliminate manual toil and ensure drift-free configurations.
- Defensive Engineering: Implement Hardened by Design security. Fine-tune SELinux policies and advanced firewall configurations to protect sensitive data without sacrificing computational performance.
- Container Orchestration: Modernize scientific workflows by deploying and managing isolated environments using Podman while working to establish a Kubernetes environment.
- HPC Performance Tuning: Push the limits of the silicon. Optimize cluster scheduling and management utilizing industry-leading tools like Bright Cluster Manager and Slurm.
- Low-Latency Networking: Configure and optimize high-bandwidth networking including InfiniBand fabrics for seamless inter-node communication.
- Technical Documentation: Author high-fidelity playbooks and strategic architectural diagrams that serve as the blueprint for our evolving infrastructure.
At COLSA people are our most valuable resource and centered at our core value. We invite you to unite your talents with opportunity and be a part of our FamilyofProfessionals!Learn about our employee-centric culture and benefitshere.
Required Experience:
IC
In this role your daily impact spans the entire spectrum of systems engineering. One hour you might be performing routine lifecycle maintenancepatching a fleet of RHEL workstations or managing user identities across a heterogeneous domainto ensure the baseline stability of our enterprise. The next y...
In this role your daily impact spans the entire spectrum of systems engineering. One hour you might be performing routine lifecycle maintenancepatching a fleet of RHEL workstations or managing user identities across a heterogeneous domainto ensure the baseline stability of our enterprise. The next you are diving into the high-performance fabric debugging a latency spike on an InfiniBand card or fine-tuning a Slurm scheduler to prioritize a mission-critical simulation.
You arent just managing boxes; you are the bridge between raw silicon and national security breakthroughs. Whether its the methodical hardening of a standard server build to meet SAP requirements or the high-adrenaline optimization of a multi-petabyte Lustre filesystem your work ensures that our researchers never have to wait on the infrastructure to catch up with their imagination. This position is 100% on-site.
Responsibilities
- Architect & Deploy: Lead the design and lifecycle management of mission-critical Linux workstations enterprise-grade servers and high-performance computing (HPC) clusters.
- Engineer Filesystems: Master the art of data movement. Administer complex local and distributed filesystems (Lustre GPFS/Spectrum Scale) to ensure extreme-speed access across the fabric.
- Infrastructure as Code (IaC): Treat the data center as a codebase. Develop sophisticated automation workflows using Python Bash and Ansible to eliminate manual toil and ensure drift-free configurations.
- Defensive Engineering: Implement Hardened by Design security. Fine-tune SELinux policies and advanced firewall configurations to protect sensitive data without sacrificing computational performance.
- Container Orchestration: Modernize scientific workflows by deploying and managing isolated environments using Podman while working to establish a Kubernetes environment.
- HPC Performance Tuning: Push the limits of the silicon. Optimize cluster scheduling and management utilizing industry-leading tools like Bright Cluster Manager and Slurm.
- Low-Latency Networking: Configure and optimize high-bandwidth networking including InfiniBand fabrics for seamless inter-node communication.
- Technical Documentation: Author high-fidelity playbooks and strategic architectural diagrams that serve as the blueprint for our evolving infrastructure.
At COLSA people are our most valuable resource and centered at our core value. We invite you to unite your talents with opportunity and be a part of our FamilyofProfessionals!Learn about our employee-centric culture and benefitshere.
Required Experience:
IC
View more
View less