Freelance AI Evaluation Consultant

Amsterdam - Netherlands

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Company Intro

AtToloka AIwe create data that powers leading GenAI models and innovations. We work with frontier labs big tech renowned AI startups enterprises and non-profit research organizations worldwide. We use a combination of Experts Crowd Tech Platform to teach AI models to reason and evaluate their efficacy and safety. We have experts in more than 50 different domainsfrom doctors and lawyers to physicists and engineersand boast one of the most diverse global crowdsrepresenting over100 countries and speaking 40 languages. We are a well-funded startup with an enviable portfolio of clients includingAnthropicAmazonMicrosoftpoolsideRecraft andShopify.

Recently we secured strategic investment led byBezos Expeditionswith participation fromMikhail ParakhinCTO of Shopifyand board advisor to leading GenAI companies who now serves as our Chairman of the Board. Our remote-first team is globally distributed around the world:USAUKthe NetherlandsIsraelCzech RepublicSerbia and more. We are headquartered in Amsterdam.

About the role:

We are seeking an analytical and technically-minded professional to:

Evaluate AI outputs and processes
Ensure quality accuracy and reliability
Identify logical errors risks and structural inconsistencies
Provide actionable insights and recommendations to the team

Ideal candidates:

Consultants auditors analysts data researchers or business/technical analysts with strong reasoning skills
Professionals curious about AI process improvement and quality evaluation
Problem-solvers who enjoy analyzing complex systems logic and scenarios

Key Responsibilities:

Lead evaluation of AI outputs and related processes
Review tasks against expected/ideal scenarios; identify gaps and risks
Provide structured actionable recommendations to engineers domain experts and managers
Maintain and improve evaluation guidelines checklists SOPs
Suggest new approaches tools and processes to enhance AI evaluation

Experience & Background:

Scenario validation data analysis auditing or consulting experience
Analytical work in research technical/business analysis or risk evaluation

Knowledge & Skills:

Strong analytical and critical thinking
Attention to detail reliability and an ownership mindset
Technical understanding: JSON/YAML basic Git/GitHub
Clear English (B2) for communication and documentation
Independent proactive mindset

Nice to Have:

Scenario-based testing annotation workflows AI/LLM evaluation
Experience in cross-functional teams

Important Notice Scam Alert Regarding Fake Job Postings

It has come to our attention that an individual or group is fraudulently impersonating Toloka to post fake jobs and solicit personal information from be aware:

Official Communication:Our recruiting team willonlycontact you from an official email address. We will NEVER use Gmail Yahoo Tolokainc or other personal or seemingly business email accounts.
Our Process:We willneverask for your bank account details credit card number or any fees as part of the application or interview process.
Official Listings:All legitimate job openings are posted on our official careers page: to do:If you see a suspicious job posting or have been contacted by someone you suspect is a scammer please do not provide any personal information. Instead report the incident to us directly at and report the profile/post to are taking this matter very seriously and are working with the appropriate parties to resolve it.

Thank you for your vigilance!

Company IntroAtToloka AIwe create data that powers leading GenAI models and innovations. We work with frontier labs big tech renowned AI startups enterprises and non-profit research organizations worldwide. We use a combination of Experts Crowd Tech Platform to teach AI models to reason and evalua...