Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailUSD 310000 - 460000
1 Vacancy
About the Team
OpenAIs Inference team ensures that our most advanced models run efficiently reliably and at scale. We build and optimize the systems that power our production APIs internal research tools and experimental model deployments. As model architectures and hardware evolve were expanding support for a broader set of compute platforms for example AMD GPUs to increase performance flexibility and resiliency across our infrastructure.
We are forming a team to generalize our inference stack including kernels communication libraries and serving infrastructure to alternative hardware architectures like AMD.
About the Role
Were hiring engineers to scale and optimize OpenAIs inference infrastructure across emerging GPU platforms. Youll work across the stack from lowlevel kernel performance to highlevel distributed execution and collaborate closely with research infra and performance teams to ensure our largest models run smoothly on new hardware.
This is a highimpact opportunity to shape OpenAIs multiplatform inference capabilities from the ground up.
In this role you will:
Design and optimize highperformance GPU kernels for AMD accelerators using HIP Triton or other performancefocused frameworks.
Build and tune collective communication libraries (e.g. RCCL) used to parallelize model execution across many GPUs.
Integrate internal modelserving infrastructure (e.g. vLLM Triton) into AMDbacked systems.
Debug and optimize distributed inference workloads across memory network and compute layers.
Validate correctness performance and scalability of model execution on large AMD GPU clusters.
You can thrive in this role if you:
Have experience writing or porting GPU kernels using HIP CUDA or Triton and care deeply about lowlevel performance.
Are familiar with communication libraries like NCCL/RCCL and understand their role in highthroughput model serving.
Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.
Enjoy solving endtoend performance challenges across hardware system libraries and orchestration layers.
Are excited to be part of a small fastmoving team building new infrastructure from first principles.
Nice to Have:
Contributions to opensource libraries like RCCL Triton or vLLM.
Experience with GPU performance tools (Nsight rocprof perf) and memory/comms profiling.
Prior experience deploying inference on AMD or other nonNVIDIA GPU environments.
Knowledge of model/tensor parallelism mixed precision and serving 10B parameter models.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that generalpurpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core and to achieve our mission we must encompass and value the many different perspectives voices and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and do not discriminate on the basis of race religion national origin gender sexual orientation age veteran status disability or any other legally protected status.
OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement
For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance we will consider qualified applicants with arrest and conviction records.
We are committed to providing reasonable accommodations to applicants with disabilities and requests can be made via thislink.
OpenAI Global Applicant Privacy Policy
At OpenAI we believe artificial intelligence has the potential to help people solve immense global challenges and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Full-Time