Job Description
Head of Research Post-Training & Reinforcement Learning
Ready to shape how the next generation of AI is trained aligned and supervised
This role is about leading one of the most critical research agendas in AI today: advancing post-training and reinforcement learning methods that ensure increasingly capable models remain aligned reliable and safe. Youll define the environments and frameworks where frontier models learn and set the direction for how society supervises AI as it surpasses human performance.
As Head of Research youll guide a team of applied ML and research experts from FAIR Meta Reality Labs Airbnb Amazon and beyond. Youll stay hands-on with the research designing experiments in RLHF DPO GRPO; developing reward models that move beyond exact-match signals; and building complex RL environments that stress-test reasoning planning and long-horizon behaviour. At the same time youll shape the technical vision ensuring the teams work translates into production systems already used by leading AI labs.
Youll also play a visible role in the broader ecosystem: publishing at top venues (NeurIPS ICLR ACL EMNLP) releasing benchmarks and open-source tools and influencing both technical standards and broader policies for AI alignment and evaluation.
You should bring:
- Deep research experience in post-training or RL methods (RLHF DPO GRPO reward modelling).
- Strong background in training and evaluating large language models.
- Proven publication record at top-tier venues (NeurIPS ICLR ICML ACL EMNLP).
- Experience leading research teams and scoping high-impact projects.
- Curiosity creativity and the ability to thrive in a fast-moving startup environment.
Package:$300k$400k base significant equity. Full benefits including health dental vision 401k unlimited PTO and global offsites. Onsite in San Francisco preferred (relocation support available) with flexibility for exceptional candidates.
If you want to define how reinforcement learning environments and post-training frameworks shape the future of AGI this is the role for you.
All applicants will receive a response.
Required Experience:
Director
Job DescriptionHead of Research Post-Training & Reinforcement LearningReady to shape how the next generation of AI is trained aligned and supervisedThis role is about leading one of the most critical research agendas in AI today: advancing post-training and reinforcement learning methods that ensur...
Job Description
Head of Research Post-Training & Reinforcement Learning
Ready to shape how the next generation of AI is trained aligned and supervised
This role is about leading one of the most critical research agendas in AI today: advancing post-training and reinforcement learning methods that ensure increasingly capable models remain aligned reliable and safe. Youll define the environments and frameworks where frontier models learn and set the direction for how society supervises AI as it surpasses human performance.
As Head of Research youll guide a team of applied ML and research experts from FAIR Meta Reality Labs Airbnb Amazon and beyond. Youll stay hands-on with the research designing experiments in RLHF DPO GRPO; developing reward models that move beyond exact-match signals; and building complex RL environments that stress-test reasoning planning and long-horizon behaviour. At the same time youll shape the technical vision ensuring the teams work translates into production systems already used by leading AI labs.
Youll also play a visible role in the broader ecosystem: publishing at top venues (NeurIPS ICLR ACL EMNLP) releasing benchmarks and open-source tools and influencing both technical standards and broader policies for AI alignment and evaluation.
You should bring:
- Deep research experience in post-training or RL methods (RLHF DPO GRPO reward modelling).
- Strong background in training and evaluating large language models.
- Proven publication record at top-tier venues (NeurIPS ICLR ICML ACL EMNLP).
- Experience leading research teams and scoping high-impact projects.
- Curiosity creativity and the ability to thrive in a fast-moving startup environment.
Package:$300k$400k base significant equity. Full benefits including health dental vision 401k unlimited PTO and global offsites. Onsite in San Francisco preferred (relocation support available) with flexibility for exceptional candidates.
If you want to define how reinforcement learning environments and post-training frameworks shape the future of AGI this is the role for you.
All applicants will receive a response.
Required Experience:
Director
View more
View less