Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailNot Disclosed
Salary Not Disclosed
1 Vacancy
HUD (YC W25) is developing agentic evals for Computer Use Agents (CUAs) that browse the web. Our CUA Evals framework is the first comprehensive evaluation tool for CUAs.
Our Mission: People dont actually know if AI agents are working. To make AI agents work in the real world we need detailed evals for a huge range of tasks.
Were backed by Y Combinator and work closely with frontier AI labs to provide agent evaluation infrastructure at scale.
HUD is a fast-growing startup. If you cant find a role on our job board feel free to suggest a new role and well reach out if we find a good fit. :)
Building new evaluations/eval environments for HUDs CUA evaluation framework.
Building out our CUA evals framework
Conducting outbound sales developing partnerships and improving developer experience for CUA developers
Leading and supporting teams of research engineers as they build out our evals
General startup operations as we scale
Strong candidates may have:
Engagement with AI Safety and AI alignment
Understanding of LLM evaluation frameworks particularly multimodal and agentic evaluations
Familiarity in using and deploying latest AI tools for operational efficiency
Experience in in fullstack LLM deployment particularly for multimodal and agentic AI evaluations
Prior experience in fast-growing startup teams
Team Size: 5-10 people currently planning significant growth
Our team: Our team includes 4 international Olympiad medallists (IOI ILO IPhO) serial AI startup founders and researchers with publications at ICLR NeurIPS etc.
Employment: Fulltime preferred but were willing to consider internship offers.
Location: Remote-friendly but if youre in the San Francisco Bay Area we do have an office you can work together in. We prioritise applicants who can show up to meetings in Pacific Time (UTC-7:00/8:00) or China/Singapore Time (UTC 8:00).
Visa Sponsorship: We provide support for relocation and visas for strong full-time candidates. For part-time/contract/internship arrangements well work fully remote (which makes things simpler anyway).
Timeline: Applications are rolling. The process should involve 1-2 interviews and take less than a week.
We prioritize operational aptitude and cultural fit. Motivated candidates are encouraged to apply even if they dont meet all criteria.
Due to high volume we may not actively respond to every application but feel free to contact us at or elsewhere if we missed your application!
Full-Time