Job Description
Want to build the large-scale RL environments frontier labs use to train agents that can truly reason and act
This team are creating complex reinforcement learning environments simulations where advanced agents learn to plan adapt and solve multi-step problems that stretch beyond standard benchmarks. The focus isnt on training the models themselves but on building the worlds that make meaningful learning and evaluation possible the foundation for more capable aligned systems.
Youll work end-to-end across environment design reward dynamics and scalable simulation developing the feedback loops that define what good looks like for intelligent behaviour. Its open-ended research-driven work where the task definition data and reward structure are often the hardest and most important problems to solve.
Youll collaborate closely with researchers tackling unsolved challenges in reinforcement learning and agent behaviour shaping experiments scaling infrastructure and refining how agents learn in the loop.
It suits someone with strong ML and RL experience deep intuition for agent dynamics and the curiosity to explore problems that dont come with clear instructions.
On-site in San Francisco. Compensation up to $300 K base (negotiable depending on experience) plus equity.
If you want to help build the environments that teach the next generation of AI systems how to think act and adapt wed love to hear from you.
All applicants will receive a response.
Required Experience:
IC
Job DescriptionWant to build the large-scale RL environments frontier labs use to train agents that can truly reason and actThis team are creating complex reinforcement learning environments simulations where advanced agents learn to plan adapt and solve multi-step problems that stretch beyond stan...
Job Description
Want to build the large-scale RL environments frontier labs use to train agents that can truly reason and act
This team are creating complex reinforcement learning environments simulations where advanced agents learn to plan adapt and solve multi-step problems that stretch beyond standard benchmarks. The focus isnt on training the models themselves but on building the worlds that make meaningful learning and evaluation possible the foundation for more capable aligned systems.
Youll work end-to-end across environment design reward dynamics and scalable simulation developing the feedback loops that define what good looks like for intelligent behaviour. Its open-ended research-driven work where the task definition data and reward structure are often the hardest and most important problems to solve.
Youll collaborate closely with researchers tackling unsolved challenges in reinforcement learning and agent behaviour shaping experiments scaling infrastructure and refining how agents learn in the loop.
It suits someone with strong ML and RL experience deep intuition for agent dynamics and the curiosity to explore problems that dont come with clear instructions.
On-site in San Francisco. Compensation up to $300 K base (negotiable depending on experience) plus equity.
If you want to help build the environments that teach the next generation of AI systems how to think act and adapt wed love to hear from you.
All applicants will receive a response.
Required Experience:
IC
View more
View less