Join the team redefining how the world experiences design.
Hey gday mabuhay kia ora hallo vtejte!
Thanks for stopping by. We know job hunting can be a little time consuming and youre probably keen to find out whats on offer so well get straight to the point.
Where and how you can work
Our flagship campus is in Sydney. We also have a campus in Melbourne and coworking spaces in Brisbane Perth and Adelaide. But you have choice in where and how you work we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals.
What youd be doing in this role
As Canva scales change continues to be part of our DNA. But we like to think thats all part of the fun. So this will give you the flavour of the type of things youll be working on when you start but this will likely evolve.
At the moment this role is focused on:
- Designing and implementing processes tools automation and libraries that service teams can use to improve the reliability of the services they own. For instance adding a new longawaited feature in our circuit breaker library.
- Working with product engineering teams to ensure reliability best practices and tools are rolled out in every service across the whole organization. Its not enough to create a new throttling library; we want to make sure its successfully used in every service.
- Fostering a culture within the Engineering org that puts reliability first and establishes processes and policies that drive reliability within product engineering teams. This includes things like SLAs error budgets oncall response incident resolution and observability best practices.
- A deep investigation into production incidents followed up by applying the learning to code.
- Researching developing and justifying the best choices in the form of design docs for tools and processes that will shape the future of reliability at Canva.
- Proposing new approaches and solutions to ensure we futureproof Canvas distributed cloud infrastructure as we scale.
- Participating in design meetings hiring interviews and code reviews.
Youre probably a match if
- You have advanced coding proficiency in Python/ Java/ GoLang and strong Object Oriented Programming fundamentals
- You have fiveplus 5 years of commercial experience working with developing complex distributed web applications.
- You have experience diagnosing and addressing issues across the full stack including frontend code backend network / infrastructure and data layer
- You have solid understanding of observability principles such as metrics logs tracing synthetic testing query construction dashboarding and alerting.
- You have experience with guiding others in the principles of incident review investigation and remedial activity.
- You have disciplined coding practices experience with code reviews and pull requests and a creative and conceptual problemsolving approach.
- You have strong communication and team collaboration skills both written and verbal. As a reliability engineer you will need to share the knowledge communicate and coordinate changes across multiple service teams.
Nice to have; Not required!
- Our services and libraries are primarily written in Java 13 so experience in Java is a nice to have. Our platform and infrastructure tooling is primarily written in Python Go and Terraform.
- Experience working with microservice architectures in large containerised distributed cloud environments (ideally AWS). Were hosted on AWS and leverage the tools they provide as much as possible
- Experience working with data warehouse analytics and reporting tools such as Snowflake Mode Analytics and Looker.
About the Group
The Reliability Platform Group is responsible for providing the tools and processes to scale reliability across all Canva services. Our teams work together and with other groups to deliver preventive and detective tooling processes and best practices that uplift Canvas reliability. We do this by driving operational excellence reducing the impact of incidents and providing visibility and accountability across the broader Engineering community.
This role sits within the Production Health team whose focus is on providing tools and guidance for Canvas engineering teams to measure and maintain their systems reliability. Their key areas of practice include oncall management servicelevel management production readiness and operational review.
Whats in it for you
Achieving our crazy big goals motivates us to work hard and we do but youll experience lots of moments of magic connectivity and fun woven throughout life at Canva too. We also offer a range of benefits to set you up for every success in and outside of work.
Heres a taste of whats on offer:
- Equity packages we want our success to be yours too
- Inclusive parental leave policy that supports all parents & carers
- An annual Vibe & Thrive allowance to support your wellbeing social connection office setup & more
- Flexible leave options that empower you to be a force for good take time to recharge and supports you personally
Check out lifeatcanva for more info.
Other stuff to know
We make hiring decisions based on your experience skills and passion as well as how you can enhance Canva and our culture. When you apply please tell us the pronouns you use and any reasonable adjustments you may need during the interview process.
We celebrate all types of skills and backgrounds at Canva so even if you dont feel like your skills quite match whats listed above we still want to hear from you!
Please note that interviews are conducted virtually.
Remote Work :
Yes
Employment Type :
Fulltime