1. Role Overview
Mercor is seeking experienced researchers and technical experts to contribute to a project supporting a frontier-model evaluation effort focused on agentic workflows. Youll design and validate challenging benchmark tasks in data science machine learning finance and coding to help surface and diagnose reasoning and problem-solving gaps in a target STEM model. The work centers on building robust real-world tasks with executable tests and then analyzing model/agent behavior.
2. Key Responsibilities
3. Core Qualifications
-
Deep expertise in data science machine learning finance and/or Python-based coding
-
Active or recently graduated PhD (Top U.S.-based school)
-
Strong research background in frontier STEM topics
-
Ability to engage reliably for 30 hours/week primarily on weekdays
-
Demonstrated technical output such as high-quality open-source contributions (especially in agentic / LLM tooling ecosystems)
-
Comfort reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors
4. More About the Opportunity
-
Initial focus area: agentic workflows for STEM tasks
-
Familiarity with agentic frameworks and OSS ecosystems is helpful (examples include LangChain MetaGPT AutoGen AutoGPT CrewAI LlamaIndex BabyAGI SuperAGI CAMEL AgentGPT Dify etc.)
-
Deliverables are expected to be reproducible and testable (clear specs deterministic tests where possible documented environments)
5. About Mercor
-
Mercor is a talent marketplace that connects top experts with leading AI labs and research organizations.
-
Our investors include Benchmark General Catalyst Adam DAngelo Larry Summers and Jack Dorsey.
-
Thousands of professionals across domains like law creatives engineering and research have joined Mercor to work on frontier projects shaping the next era of AI.
1. Role Overview Mercor is seeking experienced researchers and technical experts to contribute to a project supporting a frontier-model evaluation effort focused on agentic workflows. Youll design and validate challenging benchmark tasks in data science machine learning finance and coding to help su...
1. Role Overview
Mercor is seeking experienced researchers and technical experts to contribute to a project supporting a frontier-model evaluation effort focused on agentic workflows. Youll design and validate challenging benchmark tasks in data science machine learning finance and coding to help surface and diagnose reasoning and problem-solving gaps in a target STEM model. The work centers on building robust real-world tasks with executable tests and then analyzing model/agent behavior.
2. Key Responsibilities
3. Core Qualifications
-
Deep expertise in data science machine learning finance and/or Python-based coding
-
Active or recently graduated PhD (Top U.S.-based school)
-
Strong research background in frontier STEM topics
-
Ability to engage reliably for 30 hours/week primarily on weekdays
-
Demonstrated technical output such as high-quality open-source contributions (especially in agentic / LLM tooling ecosystems)
-
Comfort reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors
4. More About the Opportunity
-
Initial focus area: agentic workflows for STEM tasks
-
Familiarity with agentic frameworks and OSS ecosystems is helpful (examples include LangChain MetaGPT AutoGen AutoGPT CrewAI LlamaIndex BabyAGI SuperAGI CAMEL AgentGPT Dify etc.)
-
Deliverables are expected to be reproducible and testable (clear specs deterministic tests where possible documented environments)
5. About Mercor
-
Mercor is a talent marketplace that connects top experts with leading AI labs and research organizations.
-
Our investors include Benchmark General Catalyst Adam DAngelo Larry Summers and Jack Dorsey.
-
Thousands of professionals across domains like law creatives engineering and research have joined Mercor to work on frontier projects shaping the next era of AI.
View more
View less