About the team:
Canvas Design Generation systems use AI to create complete designs from text descriptions - turning user intent into layouts images typography and color palettes. At scale we generate millions of designs monthly making reliable quality evaluation critical. The Design Generation Platform team (8 engineers) supports Design Generation infrastructure with focus on developer experience tooling platform orchestration and self-service capabilities. We own the plumbing that makes Design Generation systems observable debuggable and improvable.
Our philosophy: Platform owns orchestration not application logic. We build reusable infrastructure that scales across Design Generation rather than solving one-off problems. We emphasize collaborative decision-making evidence-based approaches and building capabilities that serve researchers and engineers across Design Generation.
About the role:
As the Design Generation Evaluation owner youll build the infrastructure that enables quality monitoring across Design Generation. Youre the expert who both understands evaluation methodologies deeply AND builds the infrastructure to scale them across the organization.
This role sits at the intersection of three critical areas:
Evaluation Strategy & Expertise: Youll guide Design Generation teams on how to evaluate their systems effectively - which methods work for different scenarios how to set up robust test sets when to use LLM-as-Judge vs. 1st Party Quality Models vs. user signals and how to balance cost with detection speed. Youll establish evaluation best practices and patterns that teams can adopt.
Infrastructure & Scale: Youll build the platforms that make evaluation accessible and automated - alerting systems continuous monitoring production sampling step-level harnesses. Your goal is enabling teams to run evaluations effortlessly eventually integrating evaluation checks into Continuous Deployment so quality gates happen automatically.
Ecosystem Integration: Canva has multiple evaluation tools. Youll articulate how these pieces fit together for Design Generation define clear integration points and build the connective tissue that makes the evaluation story coherent rather than fragmented.
The challenge: Quality degradation in generative systems is subtle. Designs look slightly off-brand layouts dont quite work users stop publishing without clear signals. Youll need deep ML intuition to build detection systems that catch real issues while minimizing false positives. This role requires navigating ambiguity making pragmatic architecture decisions and building infrastructure that serves diverse evaluation needs across research and engineering.
What youll do (responsibilities)
Understand and optimize existing evaluation systems including LLM-as-Judge frameworks visual quality models and multi-dimensional scoring approaches - analyzing their strengths limitations and trade-offs to identify gaps in coverage and opportunities for improvement across brand adherence visual appeal layout quality and functional correctness
Design and implement automated evaluation pipelines that score generated designs at scale balancing accuracy with computational cost
Define evaluation strategies for different scenarios: pre-deployment validation continuous monitoring A/B experiment analysis and model comparison
Curate high-quality evaluation datasets and benchmark suites that represent diverse use cases edge cases and quality dimensions
Integrate evaluation systems into continuous deployment pipelines creating automated quality gates that catch regressions before production
Reduce evaluation cycle time to enable teams to iterate faster on model improvements and launch experiments earlier
Partner with research teams to understand evaluation needs for new model architectures and capabilities
Define the evaluation ecosystem strategy: how different evaluation tools and methods compose together for Design Generation
Guide teams on evaluation best practices appropriate methodologies for their use cases and interpretation of results
What were looking for
Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale
Proven ability to build robust scalable infrastructure (not just models) - youre a platform engineer who speaks ML
Deep understanding of distributed systems observability patterns and monitoring best practices
Python proficiency with production-quality coding standards code reviews and testing practices
Experience with data pipelines time-series data and statistical analysis for detecting anomalies
SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems
Track record of building self-service platforms or developer tooling that gets adoption
Excellent collaboration skills - this role requires working across teams to understand needs and deliver solutions
Experience with evaluation of Gen AI systems at scale (even better if thats evaluation of systems with creative outputs!)
Additional Information :
Dont tick all the boxes Dont worry about that - nobody does! Wed still love to hear from you! At Canva we know that great engineers come from a variety of backgrounds and we value passion curiosity and a willingness to learn just as much as specific experience. If youre excited about this role but dont tick every box we encourage you to apply you might a great fit in ways you didnt expect!
Whats in it for you
Achieving our crazy big goals motivates us to work hard - and we do - but youll experience lots of moments of magic connectivity and fun woven throughout life at Canva too. We also offer a stack of benefits to set you up for every success in and outside of work.
Heres a taste of whats on offer:
Check out for more info.
Other stuff to know
We make hiring decisions based on your experience skills and passion as well as how you can enhance Canva and our culture. When you apply please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.
Remote Work :
Yes
Employment Type :
Full-time
We're a global online visual communications platform on a mission to empower the world to design. Featuring a simple drag-and-drop user interface and a vast range of templates ranging from presentations, documents, websites, social media graphics, posters, apparel to videos, plus a hu ... View more