Most AI systems work in demos. Very few hold up in real customer environments.
This team is building the decision-making systems behind AI agents that operate across voice chat and email where performance is measured in outcomes not benchmarks.
Youll work on models that need to reason over time handle multi-step workflows and stay consistent across entire interactions. Not just once but repeatedly under real-world constraints.
This is applied research that ships. Youll take ideas from early concept through to production owning how systems behave when deployed at scale.
The challenge is not just capability. Its reliability making reasoning systems that can operate across long-context interactions manage memory use tools and execute workflows without breaking down.
Youll be working closely with product and engineering teams iterating on real-world failures and improving systems based on how they actually perform in production.
What youll work on
Designing and improving reasoning systems for real-world agent workflows
Building and refining memory retrieval and multi-step execution systems
Developing post-training and evaluation approaches for deployed models
Iterating on systems based on real user behaviour and performance
Taking research ideas through to production environments
What theyre looking for
Experience working on LLM systems in production
Background in RL post-training or agent-based systems
Experience building systems involving memory reasoning or tool use
Strong engineering fundamentals and ability to ship end-to-end systems
Clear understanding of how models behave outside of controlled environments
Why this role
Work on systems judged by real users not offline metrics
Direct ownership of how models behave in production
High autonomy in a fast-moving product-driven team
Real-world complexity not sandboxed problems
Package
San Francisco or London (on-site) $200K$400K base equity
All applicants will receive a response.
Required Experience:
Staff IC
Job DescriptionMost AI systems work in demos. Very few hold up in real customer environments.This team is building the decision-making systems behind AI agents that operate across voice chat and email where performance is measured in outcomes not benchmarks.Youll work on models that need to reason ...
Job Description
Most AI systems work in demos. Very few hold up in real customer environments.
This team is building the decision-making systems behind AI agents that operate across voice chat and email where performance is measured in outcomes not benchmarks.
Youll work on models that need to reason over time handle multi-step workflows and stay consistent across entire interactions. Not just once but repeatedly under real-world constraints.
This is applied research that ships. Youll take ideas from early concept through to production owning how systems behave when deployed at scale.
The challenge is not just capability. Its reliability making reasoning systems that can operate across long-context interactions manage memory use tools and execute workflows without breaking down.
Youll be working closely with product and engineering teams iterating on real-world failures and improving systems based on how they actually perform in production.
What youll work on
Designing and improving reasoning systems for real-world agent workflows
Building and refining memory retrieval and multi-step execution systems
Developing post-training and evaluation approaches for deployed models
Iterating on systems based on real user behaviour and performance
Taking research ideas through to production environments
What theyre looking for
Experience working on LLM systems in production
Background in RL post-training or agent-based systems
Experience building systems involving memory reasoning or tool use
Strong engineering fundamentals and ability to ship end-to-end systems
Clear understanding of how models behave outside of controlled environments
Why this role
Work on systems judged by real users not offline metrics
Direct ownership of how models behave in production
High autonomy in a fast-moving product-driven team
Real-world complexity not sandboxed problems
Package
San Francisco or London (on-site) $200K$400K base equity