Site Reliability Engineer

Kaseya Careers

Job Location:

Toronto - Canada

Monthly Salary: Not Disclosed

Posted on: 9 hours ago

Vacancies: 1 Vacancy

Job Summary

About Kaseya

Kaseya is the leading provider of AI-powered IT management and cybersecurity software serving Managed Service Providers (MSPs) and internal IT organizations worldwide. Our comprehensive platform helps organizations efficiently manage secure and automate their IT environments driving operational efficiency and long-term business success.

Backed by Insight Partners a leading global software investor Kaseya has experienced sustained double-digit growth and continues to expand its global footprint. Today Kaseya supports customers in more than 20 countries and manages over 15 million endpoints worldwide.

Founded in 2000 Kaseya has built a culture centered around innovation accountability and results. We are a high-growth high-performance organization that values individuals who are driven adaptable and committed to delivering exceptional outcomes for our customers and teammates alike.

At Kaseya success comes from embracing challenges moving with urgency and continuously raising the bar.

Kaseya is hiring a Site Reliability Engineer to keep our production systems healthy as we scale. Youll own the reliability of services that thousands of MSPs depend on every day. That means defining the SLOs we hold ourselves to leading incidents when they happen and building the automation that keeps things stable as we ship. The work is hands on the on call rotation is real and the environment runs heavily on AWS. If you treat reliability as a product instead of a chore youll fit in well here.

What Youll Do

Set monitor and enforce SLOs SLIs and error budgets that keep our systems reliable
Lead incident response troubleshooting and blameless postmortems that produce real fixes
Build and maintain automated deployment configuration management and infrastructure provisioning using Infrastructure as Code
Manage cloud and hybrid infrastructure with Terraform or CloudFormation balancing cost scalability and resilience
Improve observability across systems through proactive monitoring alerting and dashboards that surface issues early
Partner with development teams to bake reliability into the SDLC including deployment automation capacity planning and chaos engineering
Cut operational toil through automation systems that recover themselves and engineering solutions that scale
Support containerized and serverless workloads so they stay highly available and fault tolerant in production
Stay current on SRE cloud and observability practices and bring what works back to the team

Required Qualifications

4 to 5 years of AWS production experience
IaC ownership with Terraform or CloudFormation including state management
AWS ECS production experience (or strong Kubernetes background willing to ramp)
Active on call rotation with incidents led and postmortems written
Working fluency with SLOs SLIs and error budgets in production

Preferred Qualifications

Kubernetes production experience
Broader observability tooling (Datadog Dynatrace CloudWatch Elasticsearch/Kibana)
Chaos engineering
AWS Lambda or serverless workloads
Ansible Chef or Puppet
DevSecOps work (vulnerability scanning secrets management SOC2 or ISO 27001)
Production database support (RDS PostgreSQL MySQL)
Open source contributions or public technical portfolio

The expected annual base salary for this role is CAD $115000 to CAD $130000. Final offer will depend on experience skills and internal equity. This posting is for an existing vacancy.

Additional information
Kaseya provides equal employment opportunity to all employees and applicants without regard to race religion age ancestry gender sex sexual orientation national origin citizenship status physical or mental disability veteran status marital status or any other characteristic protected by applicable law.

Required Experience:

About KaseyaKaseya is the leading provider of AI-powered IT management and cybersecurity software serving Managed Service Providers (MSPs) and internal IT organizations worldwide. Our comprehensive platform helps organizations efficiently manage secure and automate their IT environments driving oper...