Job Description
Want to build the simulated worlds that test what frontier models are really capable of
This is a chance to join a teamadvancing the science of post-training and scalable evaluation building reinforcement learning environments that push reasoning planning and long-horizon behaviour to their limits.
Instead of static benchmarks youll create dynamic simulations that measure real intelligence not just accuracy. Youll design new post-training algorithms (RLHF DPO GRPO and beyond) develop richer reward models that move past exact-match scoring and build evaluation frameworks that define how next-generation AI is trained aligned and understood.
The work combines deep research with hands-on implementation from writing papers to seeing your methods deployed in live systems. Its ideal for researchers who care about bridging academic insight and practical impact helping AI progress beyond metrics that no longer tell the whole story.
Youll bring:
Research experience in post-training reinforcement learning or evaluation for LLMs.
Strong understanding of transformer models and experimental design.
Publication record at leading venues (NeurIPS ICLR ICML ACL EMNLP).
PhD or equivalent research experience in CS ML NLP or RL.
Package: Up to $300K base (DOE) meaningful equity comprehensive benefits (401k unlimited PTO relocation and sponsorship available).
Location: On-site in New York (preferred).
If you want to shape how AI is trained tested and trusted this is the place to do it.
All applicants will receive a response.
Job DescriptionWant to build the simulated worlds that test what frontier models are really capable ofThis is a chance to join a teamadvancing the science of post-training and scalable evaluation building reinforcement learning environments that push reasoning planning and long-horizon behaviour to...
Job Description
Want to build the simulated worlds that test what frontier models are really capable of
This is a chance to join a teamadvancing the science of post-training and scalable evaluation building reinforcement learning environments that push reasoning planning and long-horizon behaviour to their limits.
Instead of static benchmarks youll create dynamic simulations that measure real intelligence not just accuracy. Youll design new post-training algorithms (RLHF DPO GRPO and beyond) develop richer reward models that move past exact-match scoring and build evaluation frameworks that define how next-generation AI is trained aligned and understood.
The work combines deep research with hands-on implementation from writing papers to seeing your methods deployed in live systems. Its ideal for researchers who care about bridging academic insight and practical impact helping AI progress beyond metrics that no longer tell the whole story.
Youll bring:
Research experience in post-training reinforcement learning or evaluation for LLMs.
Strong understanding of transformer models and experimental design.
Publication record at leading venues (NeurIPS ICLR ICML ACL EMNLP).
PhD or equivalent research experience in CS ML NLP or RL.
Package: Up to $300K base (DOE) meaningful equity comprehensive benefits (401k unlimited PTO relocation and sponsorship available).
Location: On-site in New York (preferred).
If you want to shape how AI is trained tested and trusted this is the place to do it.
All applicants will receive a response.
View more
View less