Staff Site Reliability Engineer
Job Summary
Who Were Looking For
At ScalePad we hire thoughtful builders who want their work to matter. Our roles are designed for people who thrive on driving impact see ambiguity as an opportunity and believe that raising the bar is a team sport.
We dont bring people in to run playbooks. We hire people
who want to rewrite them. And in this role youll get to do that while shaping the future of managed services for our global partners. (Thats what we call our customers.)
What is ScalePad
At ScalePad were building more than software; were building confidence and clarity for the people who manage the technology businesses rely on every day.
Our mission: help MSPs evolve into MVPs (their clients most valuable partner). Our tools turn them from reactive service providers into strategic advisors through a consistent scalable Customer Success motion.
Our product suite unifies risk insights client planning and service delivery so MSPs can have smarter conversations show clients their value and grow their revenue.
But our purpose goes beyond our software. Were creating a workplace where curious growth-minded people can do their best work where ideas are valued progress is shared and everyone belongs. Together were creating a future where MSPs dont just keep businesses running they help them thrive. We believe that when our partners succeed we all do.
With offices in Vancouver Toronto Montreal and Phoenix and a global-first mindset. ScalePad has grown into a category leader trusted by 12000 partners across 60 countries. Weve been recognized for our products and corporate culture by MSP Today G2 and Great Place to Work to name a few.
About the role
Were looking for a Staff Site Reliability Engineer (SRE) to be the senior technical anchor across our multi-cloud platform and developer experience. This is a hands-on senior individual contributor role for an engineer who wants to own real systems unblock teams day to day and raise the bar on how engineering ships and operates at ScalePad.
Youll work directly with engineering leadership and alongside SREs across product domains. Reliability infrastructure as code internal tooling and developer productivity all sit inside your scope. Youll spend your time building operating and improving the systems the rest of engineering depends on.
What youll do
Get ready to go beyond order-taking. Your strategic responsibilities include:
Platform and Infrastructure
- Own production infrastructure across AWS and Azure including networking IAM and cost
- Build and operate Terraform modules and state at scale keeping our infrastructure as code clean and reviewable
- Run Kubernetes in production: upgrades scaling troubleshooting and platform improvements
- Operate and improve CI/CD pipelines that the entire engineering org depends on
Reliability & Operational Excellence
- Operationalize SLO/SLI frameworks and observability practices alongside the SRE team
- Own incident response practice on-call tooling and incident review follow-through
- Reduce operational toil through automation across secret rotation access management and environment provisioning
- Execute on capacity planning disaster recovery and resilience work across critical systems
Developer Experience & Technical Influence
- Build and maintain internal developer tooling that removes friction across engineering
- Lead rollouts of AI-native tooling for code review testing and engineering productivity e.g. CodeRabbit Copilot-class assistants and internal AI workflows
- Own migrations and consolidation of internal platforms such as Jira Confluence ticketing and documentation systems
- Partner with engineering and product leadership to identify and remove the biggest DX bottlenecks and align infrastructure and reliability investments with business goals
- Mentor engineers and technical leads fostering growth and knowledge-sharing within the organization
- Lead post-mortems and continuous improvement initiatives to strengthen reliability practices
Innovation & Continuous Improvement
- Evaluate and introduce new technologies tools and approaches to improve scalability and efficiency
- Drive standardization and modernization efforts across infrastructure and operational practices
- Lead proof-of-concept and experimentation initiatives to validate new reliability solutions
What were looking for
We care about what you can do more than where youve done it. However experience in the following areas will help you hit the ground running in this role:
Must-haves
- 8 years of experience in software engineering infrastructure or related technical disciplines with at least 5 years focused on Site Reliability Engineering (SRE) DevOps Platform Engineering or similar roles.
- Strong expertise in cloud infrastructure distributed systems networking and observability practices
- Experience designing and operating highly available scalable production systems
- Deep understanding of scripting automation infrastructure as code CI/CD and operational best practices
- Experience implementing SLO/SLI frameworks and reliability engineering methodologies
- Incident management troubleshooting and on-call experience in complex production environments
- Proven ability to lead large-scale technical initiatives across multiple teams
- Track record of cross-team technical influence without formal authority excellent communication and collaboration skills with both technical and non-technical stakeholders
- Passion for mentoring engineers and improving engineering culture
- Demonstrated ability to thoughtfully integrate AI-assisted tooling into engineering and operational workflows to improve efficiency reliability and developer experience
Nice to Have
- Experience rolling out AI tooling in an engineering organization
- Experience leading tooling and platform migrations such as Jira Confluence or observability stacks
- Experience with chaos engineering practices and reliability testing
- Experience optimizing large-scale cloud infrastructure costs
Perks
ScalePad offers our employees a blend of purpose growth and genuinely great perks.
- Everyones an owner. Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching.
- Support for growing families. Parental leave programs are in place to support you and your family when it matters most.
- Structured mentorship with builders. Join opt-in mentorship programs and learn directly from founders and senior leaders whove scaled multiple SaaS ventures and spent decades in the MSP industry.
- Invest in your growth every year. Access an annual professional development budget to level up your skills your career and your impact.
- Set yourself up with great tools. Work with brand new top-of-the-line hardware and equipment so you can do your best work whether youre at home or in one of our hubs.
- Modern ways of working. Roles at ScalePad are structured as remote or hybrid with hub locations in Vancouver Toronto Montreal and Phoenix. Specific work models are outlined in each posting.
- Support for hybrid life. Receive a monthly stipend to help you create an effective hybrid or remote work environment.
- Well-being and time to recharge. Take care of yourself with 100% employer-paid benefits.
Before You Apply
This is a full-time role for those who are eligible to work in Canada. We thank all applicants for taking the time to apply but only candidates who make it to the next stage will be contacted.
Note on AI Use: ScalePad uses AI technology to support certain administrative aspects of our hiring process such as transcription note-taking and interview documentation. These tools are strictly used to assist our team and have no influence on candidate evaluation or hiring decisions.
No recruiters please.
Required Experience:
Staff IC
About Company
Lifecycle Insights is an MSP platform used to assess clients, create reports and budgets, track remediation efforts, and demonstrate value to clients.