Freelance Agent Evaluation Analyst

Kraków - Poland

Monthly Salary: PLN 20 - 60

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Company Intro

AtToloka AIwe create data that powers leading GenAI models and innovations. We work with frontier labs big tech renowned AI startups enterprises and non-profit research organizations worldwide. We use a combination of Experts Crowd Tech Platform to teach AI models to reason and evaluate their efficacy and safety. We have experts in more than 50 different domainsfrom doctors and lawyers to physicists and engineersand boast one of the most diverse global crowdsrepresenting over100 countries and speaking 40 languages. We are a well-funded startup with an enviable portfolio of clients includingAnthropicAmazonMicrosoftpoolsideRecraft andShopify.

Recently we secured strategic investment led byBezos Expeditionswith participation fromMikhail ParakhinCTO of Shopifyand board advisor to leading GenAI companies who now serves as our Chairman of the Board. Our remote-first team is globally distributed around the world:USAUKthe NetherlandsIsraelCzech RepublicSerbia and more. We are headquartered in Amsterdam.

About the Role

We are looking for anFreelance Agent Evaluation Analystto take ownership of quality structure and insight across the project. This role goes far beyond task-checking - its about critical thinking systems-level analysis and ensuring clarity reliability and consistency at scale.

Youll work as both a hands-on evaluator and an analyst collaborating with domain experts delivery managers and engineers. Beyond reviewing outputs youll be expected to understand the why behind the work identify logical gaps or inconsistencies and propose meaningful improvements.

This is a flexible impact-driven role where youll have space to grow contribute ideas and help shape how evaluation and quality are scaled across the project.

This role is especially well-suited for:

Analysts researchers or consultants with strong structuring and reasoning skills
Junior product managers or strategists curious about AI and evaluation work
Smart problem-solvers (students or early-career professionals) who enjoy digging into logic systems and edge cases

You do not need a coding background. What matters most is curiosity intellectual rigor and the ability to evaluate complex setups with precision.

What youll be doing

Fully own the QA pipeline for agent evaluation tasks;
Review and validate tasks and golden paths created by scenario writers and experts;
Spot logical inconsistencies vague requirements hidden risks and unrealistic assumptions;
Provide structured feedback and ensure quality alignment across contributors;
Train onboard and mentor new QA team members;
Collaborate with domain experts delivery managers and engineers to improve test clarity and coverage;
Maintain and improve QA checklists SOPs and review guidelines;
Contribute to test planning prioritization and quality benchmarks;
Take initiative to suggest new approaches tools and processes that help scale validation and analysis.

What you should know / be able to do

Strong analyticalandcritical thinking skills;
Attention to detail and reliability - your work can be trusted without double-checking;
Experience inmanual QAscenario validation or similaranalytical work;
Comfortable working with structured formats (JSON/YAML);
Clearwritten communicationanddocumentation skills;
Ability to give constructive feedback and coach others;
Capable of working with a wide range of stakeholders: from engineers to directors/VPs.

Nice to have

Background in scenario-based testing test design or annotation workflows;
Experience with AI/LLM evaluation prompt validation or agent behavior testing;
Some technical independence (e.g. Python skills);
Familiarity with MCP / tool-based task execution;
Experience working in cross-functional teams across product delivery and engineering.

Who you are

Detail-obsessedbut also able to see the bigger picture;
Proactiveindependent and take true ownership of your work;
Strongcommunicatorwho can turn complex findings into actionable insights;
Flexibleandmotivatedto contribute across a variety of tasks and projects;
Believe quality is not just checking work but making the whole product better.

What we can offer

Freelance full-time contract (B2B)
Flexible payment based on the results of work;
Flexibility: we offer freelance collaboration. You will also design with your manager a workday that works best for you;
Hourly rate -20-60 EURper hour
Friendly community.

Important Notice Scam Alert Regarding Fake Job Postings

It has come to our attention that an individual or group is fraudulently impersonating Toloka to post fake jobs and solicit personal information from be aware:

Official Communication:Our recruiting team willonlycontact you from an official email address. We will NEVER use Gmail Yahoo Tolokainc or other personal or seemingly business email accounts.
Our Process:We willneverask for your bank account details credit card number or any fees as part of the application or interview process.
Official Listings:All legitimate job openings are posted on our official careers page: to do:If you see a suspicious job posting or have been contacted by someone you suspect is a scammer please do not provide any personal information. Instead report the incident to us directly at and report the profile/post to are taking this matter very seriously and are working with the appropriate parties to resolve it.

Thank you for your vigilance!

Company IntroAtToloka AIwe create data that powers leading GenAI models and innovations. We work with frontier labs big tech renowned AI startups enterprises and non-profit research organizations worldwide. We use a combination of Experts Crowd Tech Platform to teach AI models to reason and evalua...