AI Benchmark Engineer | Native Language Specialist Marathi

New Delhi - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About The Opportunity

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and complex locale/encoding edge cases in terminal workflows.

We are seeking experienced native-speaking software engineers to design build and validate these benchmarks. You will create high-signal high-quality tasks that genuinely test a models ability to handle multilingual environments without relying on English translation crutches.

Note this is a remote freelance opportunity

What Youll Deliver

Task Engineering: Evaluating Coding Agents.
Asset Creation: Build realistic task environments using datasets and files in your native language. Crucially these assets must remain in the target language to genuinely measure multilingual handling.
Prompting & Translation: finding failure points where AI does not work in your native language
Implementation & Verification: Support the development of robust solutions (reference implementations) and write highly reliable deterministic verifier scripts (using rubric-based judging only when strictly necessary).
Calibration & Execution: Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers (Haiku Sonnet Opus).
Quality Assurance: Participate in a rigorous 4-layer human quality control process (creation human review calibration review and audit) alongside automated LLM-based checks to ensure fairness grammatical accuracy and benchmark integrity.

Qualifications

Experience: 5 years of industry experience in software engineering.
Background: Proven track record at leading technology companies and/or graduation from top-tier engineering universities.
Language: Native or near-native fluency with a deep understanding of its grammar register and phrasing rules. High English proficiency.
Technical Stack: Strong proficiency in Python standard shell scripting and data processing.
Workflow: Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents.
Domain Expertise: Deep technical understanding of multilingual text processing pitfalls including:
- Encoding/decoding robustness and Unicode normalization.
- Locale-dependent conventions (collation casing non-Gregorian dates).
- Text I/O toolchain interoperability and safe string operations.
- (For specific languages) Bidirectional/RTL handling font fallbacks and rendering/typography in UI or artifacts.

Why Collaborate with Lilt

Your schedule your rules. As an independent contractor work when you want as much or as little as you want. No fixed hours no check-ins no micromanaging.
Get paid quickly and fairly. We respect your time and your expertise. Competitive rates prompt payments no chasing invoices.
Work on projects that actually matter. Contribute to cutting-edge AI and language technology that is shaping how humans and machines communicate.
Be part of something bigger. Join a global community of linguists subject matter experts and language professionals who are advancing human knowledge together.
Grow without limits. As a Lilt contractor you get access to diverse innovative projects that expand your portfolio and sharpen your skills across industries and domains.
Have fun doing what you love. Bring your language skills to life on projects that are as interesting as they are are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and complex locale/encoding edge cases in terminal workflows.

How to join our expert community

1 - Submit your application including an updated copy of your CV in English

2 - Next complete a GenAI assessment to evaluate your skills

3 - Finalize onboarding and profile set-up in our system and become eligible for Applied AI projects.

AI is changing how the world communicates and LILT is leading that transformation.

LILTs mission is to make the worlds information available to everyone no matter the language they speak. Join our global community who thrive on innovation and excellence. Our collective knowledge uniqueness and skills deliver multilingual AI and human-verified services to Enterprises Governments and AI Developers around the world.

Earn money. Have fun. Advance human knowledge. Work on diverse projects from anywhere any time you want. Get paid quickly and fairly and build your professional network in a supportive communityall through a streamlined application process tailored to your expertise.

Information collected and processed as part of your application process including any job applications you choose to submit is subject to LILTs Privacy Policy at LILT we are committed to a fair inclusive and transparent hiring process. As part of our recruitment efforts we may use artificial intelligence (AI) and automated tools to assist in the evaluation of applications including résumé screening assessment scoring and interview analysis. These tools are designed to support human decision-making and help us identify qualified candidates efficiently and objectively. All final hiring decisions are made by people. If you have any concerns require accommodations or would like to opt-out of the use of AI in our hiring process please let us know at

LILT is an equal opportunity employer. We extend equal opportunity to all individuals without regard to an individuals race religion color national origin ancestry sex sexual orientation gender identity age physical or mental disability medical condition genetic characteristics veteran or marital status pregnancy or any other classification protected by applicable local state or federal laws. We are committed to the principles of fair employment and the elimination of all discriminatory practices.

Required Experience:

About The OpportunityWe are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and ...

About The Opportunity

Note this is a remote freelance opportunity

What Youll Deliver

Task Engineering: Evaluating Coding Agents.
Asset Creation: Build realistic task environments using datasets and files in your native language. Crucially these assets must remain in the target language to genuinely measure multilingual handling.
Prompting & Translation: finding failure points where AI does not work in your native language
Implementation & Verification: Support the development of robust solutions (reference implementations) and write highly reliable deterministic verifier scripts (using rubric-based judging only when strictly necessary).
Calibration & Execution: Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers (Haiku Sonnet Opus).
Quality Assurance: Participate in a rigorous 4-layer human quality control process (creation human review calibration review and audit) alongside automated LLM-based checks to ensure fairness grammatical accuracy and benchmark integrity.

Qualifications

Experience: 5 years of industry experience in software engineering.
Background: Proven track record at leading technology companies and/or graduation from top-tier engineering universities.
Language: Native or near-native fluency with a deep understanding of its grammar register and phrasing rules. High English proficiency.
Technical Stack: Strong proficiency in Python standard shell scripting and data processing.
Workflow: Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents.
Domain Expertise: Deep technical understanding of multilingual text processing pitfalls including:
- Encoding/decoding robustness and Unicode normalization.
- Locale-dependent conventions (collation casing non-Gregorian dates).
- Text I/O toolchain interoperability and safe string operations.
- (For specific languages) Bidirectional/RTL handling font fallbacks and rendering/typography in UI or artifacts.

Why Collaborate with Lilt

Your schedule your rules. As an independent contractor work when you want as much or as little as you want. No fixed hours no check-ins no micromanaging.
Get paid quickly and fairly. We respect your time and your expertise. Competitive rates prompt payments no chasing invoices.
Work on projects that actually matter. Contribute to cutting-edge AI and language technology that is shaping how humans and machines communicate.
Be part of something bigger. Join a global community of linguists subject matter experts and language professionals who are advancing human knowledge together.
Grow without limits. As a Lilt contractor you get access to diverse innovative projects that expand your portfolio and sharpen your skills across industries and domains.
Have fun doing what you love. Bring your language skills to life on projects that are as interesting as they are are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and complex locale/encoding edge cases in terminal workflows.

How to join our expert community

1 - Submit your application including an updated copy of your CV in English

2 - Next complete a GenAI assessment to evaluate your skills

3 - Finalize onboarding and profile set-up in our system and become eligible for Applied AI projects.

AI is changing how the world communicates and LILT is leading that transformation.

Required Experience:

Key Skills

Apply Now

About Company

LILT

Sequoia delivers the platform and guidance to help you hone a total people investment strategy that works well for your business and people.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

AI Benchmark Engineer | Native Language Specialist Marathi

New Delhi - India

Job Summary

About The Opportunity

What Youll Deliver

Qualifications

Why Collaborate with Lilt

How to join our expert community

About The Opportunity

What Youll Deliver

Qualifications

Why Collaborate with Lilt

How to join our expert community

Key Skills

About Company

Related Jobs