GPU Performance Engineer | Experienced Hire

Susquehanna International Group, LLP

Job Location:

Bala Cynwyd, PA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

The job posting is outdated and position may be filled

Job Summary

Overview

We are looking for aGPU Performance Engineerto build highly optimized CUDA kernels for low-latency inference. This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model and where custom kernels memory layouts and execution strategies can deliver meaningful gains.

You will work closely with quantitative researchers and engineers to understand model structureidentifycomputational bottlenecks and turn mathematical ideas into production-grade GPU implementations. You will use your understanding of GPU hardware to help shape models that are both mathematically effective and efficient to run. The problems span compact neural networks tree-based models and other structured inference workloads where latency throughput and efficiency all matter.

This role is a strong fit for someone who enjoys low-level optimization performance analysis and translating abstract models into hardware-efficient code.

What youll do

Design implement and optimize custom CUDA kernels for latency-critical inference workloads
Develop fine-grained GPU implementations tailored to specific model structures
Analyze quantitative research models and computational bottlenecks to identify opportunities for parallelization and hardware-efficient execution
Collaborate directly with quantitative researchers to translate mathematical models into high-performance compute pipelines
Optimize end-to-end inference performance through kernel tuning memory-layout design execution strategy I/O optimization and precision tradeoffs
Profile and benchmark GPU performance
Improve latency and throughput in production inference systems
Contribute to GPU architecture decisions and performance best practices

What were looking for

Strong proficiency in writing and optimizing CUDA kernels
Solid programming experience in C/C (preferred)
Deep understanding of GPU architecture including memory hierarchy SIMT execution occupancy and latency/throughput tradeoffs
Ability to reason about numerical stability precision performance tradeoffs and how model design choices affect hardware efficiency
Strong problem-solving skills and comfort working with low-level systems

Preferred qualifications

PhD in mathematics physics computer science engineering or related quantitative field
Strong background in linear algebra probability numerical methods or scientific computing
Experience working with quantitative research teams or financial models
Demonstrated ability to improve real-world inference performance beyond baseline framework or library implementations
Familiarity with PTX-level behavior tensor core utilization or architecture-specific tuning
Exposure to ONNX Runtime TensorRT Triton TVM or similar systems
Exposure to neural networks tree-based models (e.g. LightGBM) state space models (e.g. Mamba architectures) and experience with kernel fusion custom operators model compilation or graph-level optimization

About Susquehanna

Susquehanna is a global quantitative trading firm powered by scientific rigor curiosity and innovation. Our culture is intellectually driven and highly collaborative bringing together researchers engineers and traders to design and deploy impactful strategies in our systematic trading environment. To meet the unique challenges of global markets Susquehanna applies machine learning and advanced quantitative research to vast datasets in order to uncover actionable insights and build effective strategies. By uniting deep market expertise with cutting-edge technology we excel in solving complex problems and pushing boundaries together.

If youre a recruiting agency and want to partner with us please reach out Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee.

#LI-KH2

#LI-Onsite

Required Experience:

Senior IC

OverviewWe are looking for aGPU Performance Engineerto build highly optimized CUDA kernels for low-latency inference. This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model and where custom kernels memory layouts and execut...