Site Reliability Engineer (SRE)

Toronto - Canada

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

The job posting is outdated and position may be filled

Job Summary

$60/hr CAD

Glider MUST

Job Summary:

We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build scale and maintain our infrastructure and services.

You will play a critical role in ensuring high availability performance scalability and security of our production systems while enabling continuous deployment and rapid delivery of features to our customers.

Key Responsibilities:

Design build and maintain reliable scalable and secure cloud-based infrastructure (AWS Azure or GCP).
Develop and improve observability using monitoring ing logging and tracing tools (e.g. Prometheus Grafana ELK Datadog etc.).
Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform CloudFormation Pulumi).
Create and maintain CI/CD pipelines (GitHub Actions GitLab CI Jenkins ArgoCD etc.) to support fast and safe delivery.
Lead incident response root cause analysis and postmortems to ensure high uptime and rapid recovery.
Optimize system performance reliability and cost-effectiveness through proactive monitoring and tuning.
Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.
Implement and maintain security best practices across environments (e.g. secrets management IAM firewalls etc.).
Maintain disaster recovery plans backups and high-availability strategies.

Qualifications: Required:

5 years of experience as an SRE DevOps Engineer or similar role.
Proficiency in scripting and automation (Bash Python Go etc.).
Strong experience with containerization and orchestration (Docker Kubernetes Helm).
Solid understanding of Linux systems administration and networking fundamentals.
Experience with cloud platforms (AWS Azure or GCP).
Experience with IaC tools like Terraform or CloudFormation.
Familiarity with GitOps and modern deployment practices.
Hands-on experience with observability tools (e.g. Prometheus Grafana Datadog).
Strong troubleshooting and incident response skills.

Preferred:

Experience in a high-traffic microservices-based architecture.
Exposure to service meshes (Istio Linkerd).
Certifications (AWS Certified DevOps Engineer CKA etc.)
Experience with security automation and compliance (e.g. SOC2 ISO27001).

Soft Skills:

Strong communication and collaboration abilities.
Ability to thrive in a fast-paced agile environment.
Analytical mindset and proactive approach to problem-solving.
A passion for automation performance and system design.

Skills

Azure Prometheus Terraform

$60/hr CAD Glider MUST Job Summary: We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build scale and maintain our infrastructure and services. You will play a critical role in ensuring high availability performance scalability and security ...