Job Description
Most AI systems work in demos. Very few hold up in real customer environments.
This team is building the decision-making systems behind AI agents that operate across voice chat and email where performance is measured in outcomes not benchmarks.
Youll work on models that need to reason over time handle multi-step workflows and stay consistent across entire interactions. Not just once but repeatedly under real-world constraints.
This is applied research that ships. Youll take ideas from early concept through to production owning how systems behave when deployed at scale.
The challenge is not just capability. Its reliability making reasoning systems that can operate across long-context interactions manage memory use tools and execute workflows without breaking down.
Youll be working closely with product and engineering teams iterating on real-world failures and improving systems based on how they actually perform in production.
What youll work on
- Designing and improving reasoning systems for real-world agent workflows
- Building and refining memory retrieval and multi-step execution systems
- Developing post-training and evaluation approaches for deployed models
- Iterating on systems based on real user behaviour and performance
- Taking research ideas through to production environments
What theyre looking for
- Experience working on LLM systems in production
- Background in RL post-training or agent-based systems
- Experience building systems involving memory reasoning or tool use
- Strong engineering fundamentals and ability to ship end-to-end systems
- Clear understanding of how models behave outside of controlled environments
Why this role
- Work on systems judged by real users not offline metrics
- Direct ownership of how models behave in production
- High autonomy in a fast-moving product-driven team
- Real-world complexity not sandboxed problems
Package
San Francisco or London (on-site)
$200K$400K base equity
All applicants will receive a response.
Required Experience:
Staff IC
Job DescriptionMost AI systems work in demos. Very few hold up in real customer environments.This team is building the decision-making systems behind AI agents that operate across voice chat and email where performance is measured in outcomes not benchmarks.Youll work on models that need to reason ...
Job Description
Most AI systems work in demos. Very few hold up in real customer environments.
This team is building the decision-making systems behind AI agents that operate across voice chat and email where performance is measured in outcomes not benchmarks.
Youll work on models that need to reason over time handle multi-step workflows and stay consistent across entire interactions. Not just once but repeatedly under real-world constraints.
This is applied research that ships. Youll take ideas from early concept through to production owning how systems behave when deployed at scale.
The challenge is not just capability. Its reliability making reasoning systems that can operate across long-context interactions manage memory use tools and execute workflows without breaking down.
Youll be working closely with product and engineering teams iterating on real-world failures and improving systems based on how they actually perform in production.
What youll work on
- Designing and improving reasoning systems for real-world agent workflows
- Building and refining memory retrieval and multi-step execution systems
- Developing post-training and evaluation approaches for deployed models
- Iterating on systems based on real user behaviour and performance
- Taking research ideas through to production environments
What theyre looking for
- Experience working on LLM systems in production
- Background in RL post-training or agent-based systems
- Experience building systems involving memory reasoning or tool use
- Strong engineering fundamentals and ability to ship end-to-end systems
- Clear understanding of how models behave outside of controlled environments
Why this role
- Work on systems judged by real users not offline metrics
- Direct ownership of how models behave in production
- High autonomy in a fast-moving product-driven team
- Real-world complexity not sandboxed problems
Package
San Francisco or London (on-site)
$200K$400K base equity
All applicants will receive a response.
Required Experience:
Staff IC
View more
View less