Overview
We are seeking a highly technical Senior Platform Engineer with deep expertise in Linux Engineering OpenStack development Kubernetes and GPU-enabled infrastructure to design build and operate SIGs next-generation infrastructure platforms supporting trading and core technology environments.
- This is a hands-on engineering role focused on building and tuning scalable resilient and high-performance infrastructure systems across CPU and GPU workloads. The ideal candidate will have strong Linux internals knowledge experience developing and operating cloud-native platforms and a deep understanding of distributed systems architecture including the efficient provisioning isolation and performance tuning of accelerator-based compute resources.
What were looking for
Linux Systems Engineering
- Deep troubleshooting across kernel networking stack storage and performance layers.
- Performance tuning for low-latency systems (CPU pinning NUMA IRQ balancing kernel tuning).
- Develop automation using Python Go or similar languages.
- Build and maintain infrastructure tooling and internal platform services.
- Implement high-availability solutions and disaster recovery strategies.
- Perform root cause analysis for production incidents affecting distributed systems.
- Design deploy and operate GPU-enabled infrastructure. Optimize GPU utilization (memory bandwidth PCIe throughput multi-process service MIG partitioning where applicable).
- Tune workloads to efficiently leverage NVIDIA GPUs (or equivalent accelerators) for compute-intensive applications.
- Troubleshoot GPU driver CUDA kernel module and firmware-related issues in production environments.
OpenStack Development & Cloud Infrastructure
- Develop and extend OpenStack services (Nova Neutron Cinder Keystone etc.).
- Build custom integrations and automation around OpenStack APIs.
- Optimize compute networking and storage performance for high-performance workloads.
- Design multi-tenant OpenStack architectures with strong isolation and security.
- Contribute to infrastructure-as-code frameworks managing OpenStack environments.
- Debug and resolve deep issues across hypervisors (KVM) networking layers and control plane services.
- Integrate OpenStack environments with Kubernetes platforms (hybrid cloud architectures).
Kubernetes Platform Engineering
- Design build and operate highly available production-grade Kubernetes clusters.
- Develop and maintain Kubernetes operators controllers and custom resource definitions (CRDs).
- Implement advanced scheduling multi-tenancy and workload isolation strategies.
- Optimize cluster performance for low-latency and high-throughput workloads.
- Integrate Kubernetes with CI/CD pipelines and GitOps workflows.
- Implement cluster observability using Prometheus Grafana OpenTelemetry etc.
- Design and enforce networking policies (CNI) ingress architecture.
- Implement secure cluster design including RBAC OPA/Gatekeeper secrets management and runtime security.
Automation & Infrastructure as Code
- Design and maintain infrastructure using Terraform Ansible Helm or similar tools.
- Build CI/CD pipelines for infrastructure and platform deployments.
- Implement immutable infrastructure and GitOps methodologies.
- Create automated validation testing and deployment frameworks for platform services.
Required Technical Skills
- Advanced Linux systems knowledge (kernel networking storage)
- Experience deploying and operating GPU-enabled Linux servers
- Understanding of CUDA drivers GPU kernel modules
- Performance profiling and Tuning Workloads for compute-intensive applications.
- Hands-on OpenStack development and operations experience
- Strong experience administering and engineering production Kubernetes clusters
- Strong understanding of distributed systems principles:
- Consensus
- Replication
- Fault tolerance
- CAP theorem tradeoffs
- Experience with
- Python or similar programming languages
- Infrastructure as Code (Terraform Ansible)
- Container runtimes (containerd CRI-O)
- Observability stacks (Prometheus Grafana ELK)
Desirable Experience
- Experience in low-latency or high-performance trading environments
- High-performance networking (DPDK SR-IOV CNI tuning)
- Storage systems (Ceph distributed storage NVMe optimization)
- Contribution to open-source projects (Kubernetes OpenStack)
- Experience designing multi-region or hybrid cloud architectures
- Experience tuning AI/ML quantitative or high-performance compute workloads on GPUs
- Experience with NVIDIA DCGM MIG (Multi-Instance GPU) or vGPU configurations
- Familiarity with RDMA GPUDirect or high-throughput interconnects
- Experience optimizing containerized ML or compute pipelines
Key Attributes
- Strong systems thinking and deep technical curiosity
- Ability to diagnose complex cross-layer failures
- Passion for building reliable scalable distributed systems
- Comfortable operating in high-availability high-performance production environments
- Strong documentation and knowledge-sharing mindset
Required Experience:
Senior IC
OverviewWe are seeking a highly technical Senior Platform Engineer with deep expertise in Linux Engineering OpenStack development Kubernetes and GPU-enabled infrastructure to design build and operate SIGs next-generation infrastructure platforms supporting trading and core technology environments.Th...
Overview
We are seeking a highly technical Senior Platform Engineer with deep expertise in Linux Engineering OpenStack development Kubernetes and GPU-enabled infrastructure to design build and operate SIGs next-generation infrastructure platforms supporting trading and core technology environments.
- This is a hands-on engineering role focused on building and tuning scalable resilient and high-performance infrastructure systems across CPU and GPU workloads. The ideal candidate will have strong Linux internals knowledge experience developing and operating cloud-native platforms and a deep understanding of distributed systems architecture including the efficient provisioning isolation and performance tuning of accelerator-based compute resources.
What were looking for
Linux Systems Engineering
- Deep troubleshooting across kernel networking stack storage and performance layers.
- Performance tuning for low-latency systems (CPU pinning NUMA IRQ balancing kernel tuning).
- Develop automation using Python Go or similar languages.
- Build and maintain infrastructure tooling and internal platform services.
- Implement high-availability solutions and disaster recovery strategies.
- Perform root cause analysis for production incidents affecting distributed systems.
- Design deploy and operate GPU-enabled infrastructure. Optimize GPU utilization (memory bandwidth PCIe throughput multi-process service MIG partitioning where applicable).
- Tune workloads to efficiently leverage NVIDIA GPUs (or equivalent accelerators) for compute-intensive applications.
- Troubleshoot GPU driver CUDA kernel module and firmware-related issues in production environments.
OpenStack Development & Cloud Infrastructure
- Develop and extend OpenStack services (Nova Neutron Cinder Keystone etc.).
- Build custom integrations and automation around OpenStack APIs.
- Optimize compute networking and storage performance for high-performance workloads.
- Design multi-tenant OpenStack architectures with strong isolation and security.
- Contribute to infrastructure-as-code frameworks managing OpenStack environments.
- Debug and resolve deep issues across hypervisors (KVM) networking layers and control plane services.
- Integrate OpenStack environments with Kubernetes platforms (hybrid cloud architectures).
Kubernetes Platform Engineering
- Design build and operate highly available production-grade Kubernetes clusters.
- Develop and maintain Kubernetes operators controllers and custom resource definitions (CRDs).
- Implement advanced scheduling multi-tenancy and workload isolation strategies.
- Optimize cluster performance for low-latency and high-throughput workloads.
- Integrate Kubernetes with CI/CD pipelines and GitOps workflows.
- Implement cluster observability using Prometheus Grafana OpenTelemetry etc.
- Design and enforce networking policies (CNI) ingress architecture.
- Implement secure cluster design including RBAC OPA/Gatekeeper secrets management and runtime security.
Automation & Infrastructure as Code
- Design and maintain infrastructure using Terraform Ansible Helm or similar tools.
- Build CI/CD pipelines for infrastructure and platform deployments.
- Implement immutable infrastructure and GitOps methodologies.
- Create automated validation testing and deployment frameworks for platform services.
Required Technical Skills
- Advanced Linux systems knowledge (kernel networking storage)
- Experience deploying and operating GPU-enabled Linux servers
- Understanding of CUDA drivers GPU kernel modules
- Performance profiling and Tuning Workloads for compute-intensive applications.
- Hands-on OpenStack development and operations experience
- Strong experience administering and engineering production Kubernetes clusters
- Strong understanding of distributed systems principles:
- Consensus
- Replication
- Fault tolerance
- CAP theorem tradeoffs
- Experience with
- Python or similar programming languages
- Infrastructure as Code (Terraform Ansible)
- Container runtimes (containerd CRI-O)
- Observability stacks (Prometheus Grafana ELK)
Desirable Experience
- Experience in low-latency or high-performance trading environments
- High-performance networking (DPDK SR-IOV CNI tuning)
- Storage systems (Ceph distributed storage NVMe optimization)
- Contribution to open-source projects (Kubernetes OpenStack)
- Experience designing multi-region or hybrid cloud architectures
- Experience tuning AI/ML quantitative or high-performance compute workloads on GPUs
- Experience with NVIDIA DCGM MIG (Multi-Instance GPU) or vGPU configurations
- Familiarity with RDMA GPUDirect or high-throughput interconnects
- Experience optimizing containerized ML or compute pipelines
Key Attributes
- Strong systems thinking and deep technical curiosity
- Ability to diagnose complex cross-layer failures
- Passion for building reliable scalable distributed systems
- Comfortable operating in high-availability high-performance production environments
- Strong documentation and knowledge-sharing mindset
Required Experience:
Senior IC
View more
View less