GPU Performance Engineer | Experienced Hire

Not Interested
Bookmark
Report This Job

profile Job Location:

Bala Cynwyd, PA - USA

profile Monthly Salary: Not Disclosed
Posted on: 4 days ago
Vacancies: 1 Vacancy

Job Summary

Overview

We are looking for aGPU Performance Engineerto build highly optimized CUDA kernels for low-latency inference. This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model and where custom kernels memory layouts and execution strategies can deliver meaningful gains.

You will work closely with quantitative researchers and engineers to understand model structureidentifycomputational bottlenecks and turn mathematical ideas into production-grade GPU implementations. You will use your understanding of GPU hardware to help shape models that are both mathematically effective and efficient to run. The problems span compact neural networks tree-based models and other structured inference workloads where latency throughput and efficiency all matter.

This role is a strong fit for someone who enjoys low-level optimization performance analysis and translating abstract models into hardware-efficient code.

In this role you will:

  • Design implement andoptimizecustom CUDA kernels forlatency-criticalinference workloads
  • Develop fine-grained GPU implementations tailored to specific model structures
  • Analyze quantitative research models andcomputational bottlenecks toidentifyopportunities for parallelizationand hardware-efficient execution
  • Collaborate directly with quantitative researchers to translate mathematical models into high-performancecomputepipelines
  • Optimizeend-to-end inference performance through kernel tuning memory-layout design execution strategyI/O optimizationand precision tradeoffs
  • Profile and benchmark GPU performance
  • Improvelatency andthroughput in production inference systems
  • Contribute to GPU architecture decisions and performance best practices

What were looking for

  • Strongproficiencyin writing andoptimizingCUDA kernels
  • Solid programming experience inC/C(preferred)
  • Deep understanding of GPU architecture including memory hierarchy SIMT execution occupancy and latency/throughput tradeoffs
  • Ability to reason about numerical stability precision performance tradeoffs and how model design choices affect hardware efficiency
  • Strong problem-solving skills and comfort working with low-level systems

Preferred qualifications:

  • PhD in Mathematics Physics Computer Science Engineering or related quantitative field
  • Strong background in linear algebra probability numerical methods or scientific computing
  • Experience working with quantitative research teams or financial models
  • Demonstrated ability to improve real-world inference performance beyond baseline framework or library implementations
  • Familiarity with PTX-level behavior tensorcoreutilization or architecture-specific tuning
  • Exposure to ONNX RuntimeTensorRT Triton TVM or similar systems
  • Exposure to:
  • Neural networks
  • Tree-based models (e.g.LightGBM)
  • State space models (e.g. Mambaarchitectures)
  • Experience with kernel fusion custom operators model compilation or graph-level optimization

About Susquehanna

If youre a recruiting agency and want to partner with us please reach out to. Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee.

#LI-Onsite


Required Experience:

Senior IC

OverviewWe are looking for aGPU Performance Engineerto build highly optimized CUDA kernels for low-latency inference. This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model and where custom kernels memory layouts and execut...
View more view more

About Company

Discover Susquehanna, a global quantitative trading firm built on a rigorous, analytical foundation in financial markets.

View Profile View Profile