About the Role
We are looking for a Forward Deployment Engineer (FDE) to work directly with customers to design deploy and validate inference & reinforcement learning POCs on GMIs GPU infrastructure. This is a hybrid role spanning platform engineering applied ML and customer success youll turn research ideas into performant systems on real GPU clusters.
What Youll Do
- Own customer POCs end-to-end: deploy and optimize LLM inference RL training and post-training workflows
- Work hands-on with research teams startups and enterprise customers
- Debug performance stability and correctness in real environments
- Stand up and tune inference stacks (vLLM SGLang Ray Serve etc.)
- Optimize latency throughput GPU utilization and cost efficiency
- Support RLHF / RFT / SFT workflows using customer datasets
- Diagnose GPU networking and distributed system bottlenecks
- Feed customer learnings back into platform SDKs and APIs
What You Bring
- Strong software engineering background (Python required; Go/Rust a plus)
- Hands-on experience with ML inference or training systems
- Familiarity with distributed systems and GPUs (multi-GPU multi-node)
- Experience with LLM inference frameworks (vLLM SGLang Ray Serve Triton etc.) nice to have
- Experience with RL or post-training workflows (RLHF RFT SFT) nice to have
- Knowledge of PyTorch DeepSpeed Megatron-LM or Kubernetes-based ML platforms nice to have
- Comfort working directly with customers and ambiguous requirements
- Ability to debug end-to-end systems (code infra networking performance)
- Bilingual in English and Mandarin (required to interface with engineering teams in China)
- 2 years of experience (no need for deep systems expertise but strong execution mindset)
- Fast learner strong communication and presentation skills
Why Join Us
- Work on cutting-edge inference and RL workloads no toy demos
- Close to real users and real GPUs not abstract roadmaps
- High ownership fast iteration and visible impact
- Visa sponsorship (H1B) and green card support available based on performance
About the Role We are looking for a Forward Deployment Engineer (FDE) to work directly with customers to design deploy and validate inference & reinforcement learning POCs on GMIs GPU infrastructure. This is a hybrid role spanning platform engineering applied ML and customer success youll turn rese...
About the Role
We are looking for a Forward Deployment Engineer (FDE) to work directly with customers to design deploy and validate inference & reinforcement learning POCs on GMIs GPU infrastructure. This is a hybrid role spanning platform engineering applied ML and customer success youll turn research ideas into performant systems on real GPU clusters.
What Youll Do
- Own customer POCs end-to-end: deploy and optimize LLM inference RL training and post-training workflows
- Work hands-on with research teams startups and enterprise customers
- Debug performance stability and correctness in real environments
- Stand up and tune inference stacks (vLLM SGLang Ray Serve etc.)
- Optimize latency throughput GPU utilization and cost efficiency
- Support RLHF / RFT / SFT workflows using customer datasets
- Diagnose GPU networking and distributed system bottlenecks
- Feed customer learnings back into platform SDKs and APIs
What You Bring
- Strong software engineering background (Python required; Go/Rust a plus)
- Hands-on experience with ML inference or training systems
- Familiarity with distributed systems and GPUs (multi-GPU multi-node)
- Experience with LLM inference frameworks (vLLM SGLang Ray Serve Triton etc.) nice to have
- Experience with RL or post-training workflows (RLHF RFT SFT) nice to have
- Knowledge of PyTorch DeepSpeed Megatron-LM or Kubernetes-based ML platforms nice to have
- Comfort working directly with customers and ambiguous requirements
- Ability to debug end-to-end systems (code infra networking performance)
- Bilingual in English and Mandarin (required to interface with engineering teams in China)
- 2 years of experience (no need for deep systems expertise but strong execution mindset)
- Fast learner strong communication and presentation skills
Why Join Us
- Work on cutting-edge inference and RL workloads no toy demos
- Close to real users and real GPUs not abstract roadmaps
- High ownership fast iteration and visible impact
- Visa sponsorship (H1B) and green card support available based on performance
View more
View less