Role: Solution Architect
ITAR Resource: YES (Only USC and GC)
Location: Preferably Albany in NY
Duration: 12 months
Full Time: $105k
Onsite/Hybrid/WFH: Onsite
JD:
Seeking a high-performance Senior Solution Architect to lead the convergence of traditional High-Performance Computing (HPC) and modern cloud-native architectures. He/she will be responsible for designing and optimizing large-scale containerized environments using Docker Mirantis and the ELK Stack while integrating advanced batch schedulers to handle intensive computational workloads.
Core Responsibilities
Architecture & Design: Develop end-to-end solutions for hybrid cloud environments integrating Mirantis Container Cloud with dedicated HPC clusters to balance performance and elasticity.
HPC Orchestration: Design and implement job scheduling strategies using tools like Slurm Volcano or LAVA to manage high-throughput batch jobs and ensure deterministic resource allocation for AI/ML and scientific simulations.
Optimization & Performance Tuning: Implement best practices for container performance including multi-stage builds minimal base images (e.g. Alpine) and resource limits (CPU/Memory/GPU) to minimize overhead and prevent noisy neighbor issues.
Centralized Observability: Lead the architecture of an enterprise-grade ELK Stack (Elasticsearch Logstash Kibana) for real-time monitoring of HPC jobs utilizing Index Lifecycle Management (ILM) to handle massive log volumes efficiently.
Full-Stack Automation: Build robust Infrastructure-as-Code (IaC) pipelines using Terraform and Ansible to automate the deployment of Mirantis Kubernetes Engine (MKE) and integrated HPC schedulers.
CI/CD Automation: Implement and manage continuous integration and delivery pipelines using Jenkins GitLab CI or Argo Workflows to ensure reliable automated build and deployment processes.
Hybrid Integration: Design bridges between Kubernetes and traditional schedulers allowing specialized workloads to utilize high-speed interconnects like InfiniBand while maintaining container-native management.
Required Technical Skills
Containers & Mirantis: Deep expertise in Docker Runtime Mirantis Kubernetes Engine (MKE) and Lens Desktop for cluster management.
HPC Schedulers: Proven experience with Slurm PBS or Kubernetes-native batch schedulers (e.g. Volcano) for managing priority queues and gang scheduling.
ELK Stack Mastery: Advanced knowledge of Logstash pipeline optimization shard allocation strategies in Elasticsearch and creating actionable performance dashboards in Kibana.
Performance Tools: Experience with hardware-software integration tools such as NVIDIA Enroot/Pyxis for running containers on HPC clusters with bare-metal-level performance.
Security & Compliance: Implementation of secure container registries RBAC and encrypted communications (TLS) across the entire stack.
Experience & Qualifications
Professional Background: 10 years in systems architecture with 5 years specifically in HPC Cloud Infrastructure or DevOps at scale.
HPC Knowledge: Familiarity with MPI (Message Passing Interface) and low-latency networking requirements.
Certification: Preferred certifications include Certified Kubernetes Administrator (CKA) or Mirantis-specific technical certifications.
Cloud Platforms: AWS experience preferred especially for HPC or containerized workloads on EKS Batch FSx for Lustre and EC2 GPU instances
Role: Solution Architect ITAR Resource: YES (Only USC and GC) Location: Preferably Albany in NY Duration: 12 months Full Time: $105k Onsite/Hybrid/WFH: Onsite JD: Seeking a high-performance Senior Solution Architect to lead the convergence of traditional High-Performance Computing (HPC) and moder...
Role: Solution Architect
ITAR Resource: YES (Only USC and GC)
Location: Preferably Albany in NY
Duration: 12 months
Full Time: $105k
Onsite/Hybrid/WFH: Onsite
JD:
Seeking a high-performance Senior Solution Architect to lead the convergence of traditional High-Performance Computing (HPC) and modern cloud-native architectures. He/she will be responsible for designing and optimizing large-scale containerized environments using Docker Mirantis and the ELK Stack while integrating advanced batch schedulers to handle intensive computational workloads.
Core Responsibilities
Architecture & Design: Develop end-to-end solutions for hybrid cloud environments integrating Mirantis Container Cloud with dedicated HPC clusters to balance performance and elasticity.
HPC Orchestration: Design and implement job scheduling strategies using tools like Slurm Volcano or LAVA to manage high-throughput batch jobs and ensure deterministic resource allocation for AI/ML and scientific simulations.
Optimization & Performance Tuning: Implement best practices for container performance including multi-stage builds minimal base images (e.g. Alpine) and resource limits (CPU/Memory/GPU) to minimize overhead and prevent noisy neighbor issues.
Centralized Observability: Lead the architecture of an enterprise-grade ELK Stack (Elasticsearch Logstash Kibana) for real-time monitoring of HPC jobs utilizing Index Lifecycle Management (ILM) to handle massive log volumes efficiently.
Full-Stack Automation: Build robust Infrastructure-as-Code (IaC) pipelines using Terraform and Ansible to automate the deployment of Mirantis Kubernetes Engine (MKE) and integrated HPC schedulers.
CI/CD Automation: Implement and manage continuous integration and delivery pipelines using Jenkins GitLab CI or Argo Workflows to ensure reliable automated build and deployment processes.
Hybrid Integration: Design bridges between Kubernetes and traditional schedulers allowing specialized workloads to utilize high-speed interconnects like InfiniBand while maintaining container-native management.
Required Technical Skills
Containers & Mirantis: Deep expertise in Docker Runtime Mirantis Kubernetes Engine (MKE) and Lens Desktop for cluster management.
HPC Schedulers: Proven experience with Slurm PBS or Kubernetes-native batch schedulers (e.g. Volcano) for managing priority queues and gang scheduling.
ELK Stack Mastery: Advanced knowledge of Logstash pipeline optimization shard allocation strategies in Elasticsearch and creating actionable performance dashboards in Kibana.
Performance Tools: Experience with hardware-software integration tools such as NVIDIA Enroot/Pyxis for running containers on HPC clusters with bare-metal-level performance.
Security & Compliance: Implementation of secure container registries RBAC and encrypted communications (TLS) across the entire stack.
Experience & Qualifications
Professional Background: 10 years in systems architecture with 5 years specifically in HPC Cloud Infrastructure or DevOps at scale.
HPC Knowledge: Familiarity with MPI (Message Passing Interface) and low-latency networking requirements.
Certification: Preferred certifications include Certified Kubernetes Administrator (CKA) or Mirantis-specific technical certifications.
Cloud Platforms: AWS experience preferred especially for HPC or containerized workloads on EKS Batch FSx for Lustre and EC2 GPU instances
View more
View less