Senior Machine Learning Engineer Training Platform (AU remote)
Job Summary
About the Group/Team
Were part of the Training Platform team within Canvas AI Platform group which sits in the Generative AI supergroup. Our team is responsible for the systems that power model training at scale building the foundations that enable teams across Canva to create train and scale AI-powered experiences.
Our focus is on building reliable efficient and developer-friendly training infrastructure from orchestration and distributed training systems to experimentation and platform capabilities that support large-scale AI workloads.
We enable teams across Canva to push the boundaries of whats possible with AI.
About the Role/Specialty
As a Senior Machine Learning Engineer youll focus on designing scaling and maturing the systems and infrastructure that support training workloads across Canva. Youll work on a Kubernetes-based training platform that enables distributed AI workloads across a wide range of teams frameworks and use cases while also contributing to the surrounding platform capabilities that support the end-to-end training lifecycle such as experiment management artifact management and other core systems needed to run AI workloads reliably and at scale. Youll help evolve these capabilities over time improving their reliability scalability usability and overall platform maturity.
Youll collaborate closely with research scientists AI engineers product teams and cloud/infrastructure teams to ensure workloads can run efficiently reproducibly and reliably at scale. Youll also help shape the roadmap for the platform by understanding user pain points improving platform capabilities and contributing to the long-term direction of Canvas training infrastructure.
This role is ideal for someone who enjoys working on the systems behind AI not just the models themselves and wants to have broad impact across multiple teams.
What youll do (responsibilities)
Youll contribute to the evolution of Canvas unified training platform for AI training workloads
Youll improve reliability observability debugging and operational support for training systems
Youll design and build the platform capabilities that enable better scheduling at scale including resource allocation priority management and quota management for training workloads.
Youll collaborate closely with research scientists ML engineers product teams and cloud/infrastructure teams to improve training platform workflows and outcomes
Youll contribute to system design and architecture decisions across Canvas AI Platform
Youll help shape platform roadmap and priorities based on user pain points adoption needs and long-term platform maturity
Youll mentor engineers and share best practices in AI systems and infrastructure
What were looking for
Youre an engineer who loves building the systems that power AI at scale. You have strong experience in training pipelines distributed systems or large-scale AI infrastructure and youre excited by the challenge of making training workloads more reliable scalable and efficient.
You bring strong experience working with Kubernetes and containerized workloads. Experience with training infrastructure or distributed frameworks such as Ray PyTorch distributed training or similar technologies will be highly valuable.
Youre also familiar with the modern cloud and infrastructure services that underpin high-performance AI workloads for example high-performance storage HPC environments fast interconnects and networking capabilities or services such as FSx EFA and related infrastructure commonly used in large-scale training environments.
You bring a strong sense of ownership and enjoy working on complex cross-cutting problems that impact multiple teams. Youre comfortable collaborating with engineers applied scientists and infrastructure partners and you care deeply about scalability reliability usability and developer experience. Most importantly youre motivated by the opportunity to help Canva build the platform foundations that enable AI-powered creativity at scale.
What the candidate will learn and how will they develop at Canva:
Deep expertise in large-scale AI training systems Kubernetes-based workload orchestration and execution and distributed infrastructure
Hands-on experience with modern AI training workloads at scale
Exposure to the cloud storage and networking capabilities required for high-performance distributed training environments
Opportunities to influence platform-wide architecture roadmap and AI Platform best practices
Growth through collaboration with world-class ML engineers applied scientists and infrastructure specialists
The ability to shape how AI is built and scaled across a global product
Additional Information :
Dont tick all the boxes Dont worry about that - nobody does!
Wed still love to hear from you! At Canva we know that great engineers come from a variety of backgrounds and we value passion curiosity and a willingness to learn just as much as specific experience. If youre excited about this role but dont tick every box we encourage you to apply you might a great fit in ways you didnt expect!
Whats in it for you
Achieving our crazy big goals motivates us to work hard - and we do - but youll experience lots of moments of magic connectivity and fun woven throughout life at Canva too. We also offer a stack of benefits to set you up for every success in and outside of work.
Heres a taste of whats on offer:
- Equity packages - we want our success to be yours too
- Inclusive parental leave policy that supports all parents & carers
- An annual Vibe & Thrive allowance to support your wellbeing social connection office setup & more
- Flexible leave options that empower you to be a force for good take time to recharge and supports you personally
Check out for more info.
Other stuff to know
We make hiring decisions based on your experience skills and passion as well as how you can enhance Canva and our culture. When you apply please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.
All interviews are conducted virtually
Remote Work :
Yes
Employment Type :
Full-time
About Company
We're a global online visual communications platform on a mission to empower the world to design. Featuring a simple drag-and-drop user interface and a vast range of templates ranging from presentations, documents, websites, social media graphics, posters, apparel to videos, plus a hu ... View more