Senior Research Engineer - Evaluations

Canva

Job Location:

Any - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

At Canva our mission is to empower the world to design. To ensure our generative AI models are truly helpful we are seeking a talented Research Engineer to build our next-generation evaluation system by leveraging automatic evaluations.

About the role:

You will engineer sophisticated AI agents that can automatically assess the quality and human alignment of our generative design models. This high-impact role focuses on building the practical systems that make cutting-edge research effective to provide a rapid feedback loop that guides the future of design generation at Canva ultimately empowering millions of users to create.

At the moment this role is focused on:

Agentic Evaluation Systems: Engineering autonomous AI agents that use Multimodal Large Language Models (MLLMs) to evaluate the quality relevance and human alignment of generated designs.
Inference-Time Alignment: Mastering techniques that improve model outputs without full retraining but by inference-based methods including prompt engineering in-context learning and Retrieval-Augmented Generation (RAG).
Model Benchmarking & Analysis: Building a rigorous framework to systematically benchmark internal and external quality understanding models delivering clear data-driven insights on human alignment.

Primary Responsibilities:

Design build and optimize the infrastructure for an MLLM-as-a-Judge evaluation system for scalable automated feedback.
Implement and experiment with inference-time alignment techniques (Prompt Engineering RAG ICL) to directly improve model output quality.
Establish and manage a comprehensive benchmarking process to compare various foundation models on design-centric tasks.
Analyze evaluation data to identify model failure modes and provide actionable recommendations to the research team.
Collaborate with research scientists and ML engineers to integrate the agentic judge system into the model development lifecycle.
Translate the latest research in LLM evaluation and agentic AI into practical production-ready engineering solutions.

Youre probably a match if you:

You have a strong understanding of generative AI models (e.g. Diffusion Models GANs Transformers) and their architectures with practical experience that informs robust evaluation strategies
Excel at creating data-driven evaluation methodologies turning user analytics into clear actionable insights.
Youve successfully managed or optimized large-scale distributed model training across hundreds of GPUs
You have a solid understanding of machine learning have worked with PyTorch and know how to optimize such codes for speed
You have disciplined coding practices and are experienced with code reviews and pull requests.
You have experience working in cloud environments ideally AWS

Nice to Have:

Familiarity with evaluation libraries and frameworks.
Experience building or working with agentic AI systems or multi-agent coordination.
Knowledge of data visualization tools to communicate findings effectively.
A background or interest in human-computer interaction design principles.

Additional Information :

Whats in it for you

Achieving our crazy big goals motivates us to work hard - and we do - but youll experience lots of moments of magic connectivity and fun woven throughout life at Canva too. We also offer a stack of benefits to set you up for every success in and outside of work.

Heres a taste of whats on offer:

Equity packages - we want our success to be yours too
Health benefits plans to support you and your wellbeing
401(k) retirement plan with company contribution
Inclusive parental leave policy that supports all parents & carers
An annual Vibe & Thrive allowance to support your wellbeing social connection office setup & more
Flexible leave options that empower you to be a force for good take time to recharge and supports you personally

Check out for more information.

Other stuff to know

We make hiring decisions based on your experience skills merit and business needs in compliance with applicable local laws. We celebrate all types of skills and backgrounds at Canva so even if you dont feel like your skills quite match whats listed above - we still want to hear from you!

When you apply please tell us the pronouns you use and any reasonable adjustments you may need during the interview process. Please note that interviews are conducted virtually.

At Canva we value fairness and we strive to provide competitive market-informed compensation whilst ensuring internal equity within the team in each region. The target base salary range for this position is $220000 - $280000. When calculating offers we make salary decisions based on market data your experience levels and internal benchmarks of your peers in the same domain and job level.

Remote Work :

Yes

Employment Type :

Full-time

About the role:

At the moment this role is focused on:

Agentic Evaluation Systems: Engineering autonomous AI agents that use Multimodal Large Language Models (MLLMs) to evaluate the quality relevance and human alignment of generated designs.
Inference-Time Alignment: Mastering techniques that improve model outputs without full retraining but by inference-based methods including prompt engineering in-context learning and Retrieval-Augmented Generation (RAG).
Model Benchmarking & Analysis: Building a rigorous framework to systematically benchmark internal and external quality understanding models delivering clear data-driven insights on human alignment.

Primary Responsibilities:

Design build and optimize the infrastructure for an MLLM-as-a-Judge evaluation system for scalable automated feedback.
Implement and experiment with inference-time alignment techniques (Prompt Engineering RAG ICL) to directly improve model output quality.
Establish and manage a comprehensive benchmarking process to compare various foundation models on design-centric tasks.
Analyze evaluation data to identify model failure modes and provide actionable recommendations to the research team.
Collaborate with research scientists and ML engineers to integrate the agentic judge system into the model development lifecycle.
Translate the latest research in LLM evaluation and agentic AI into practical production-ready engineering solutions.

Youre probably a match if you:

You have a strong understanding of generative AI models (e.g. Diffusion Models GANs Transformers) and their architectures with practical experience that informs robust evaluation strategies
Excel at creating data-driven evaluation methodologies turning user analytics into clear actionable insights.
Youve successfully managed or optimized large-scale distributed model training across hundreds of GPUs
You have a solid understanding of machine learning have worked with PyTorch and know how to optimize such codes for speed
You have disciplined coding practices and are experienced with code reviews and pull requests.
You have experience working in cloud environments ideally AWS

Nice to Have:

Familiarity with evaluation libraries and frameworks.
Experience building or working with agentic AI systems or multi-agent coordination.
Knowledge of data visualization tools to communicate findings effectively.
A background or interest in human-computer interaction design principles.

Additional Information :

Whats in it for you

Heres a taste of whats on offer:

Equity packages - we want our success to be yours too
Health benefits plans to support you and your wellbeing
401(k) retirement plan with company contribution
Inclusive parental leave policy that supports all parents & carers
An annual Vibe & Thrive allowance to support your wellbeing social connection office setup & more
Flexible leave options that empower you to be a force for good take time to recharge and supports you personally

Check out for more information.

Other stuff to know

Remote Work :

Yes

Employment Type :

Full-time

Key Skills

Arm
Machine Learning
Raspberry Pi
Python
C/C++
Image Processing
5G
OS Kernels
Research Experience
Firewall
Research & Development
Computer Vision

Apply Now

About Company

Canva

We're a global online visual communications platform on a mission to empower the world to design. Featuring a simple drag-and-drop user interface and a vast range of templates ranging from presentations, documents, websites, social media graphics, posters, apparel to videos, plus a hu ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click