Site Reliability Engineer (SRE)

Cloudious LLC

Not Interested
Bookmark
Report This Job

profile Job Location:

Toronto - Canada

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

$60/hr CAD

Glider MUST

Job Summary:

We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build scale and maintain our infrastructure and services.

You will play a critical role in ensuring high availability performance scalability and security of our production systems while enabling continuous deployment and rapid delivery of features to our customers.

Key Responsibilities:

  • Design build and maintain reliable scalable and secure cloud-based infrastructure (AWS Azure or GCP).
  • Develop and improve observability using monitoring ing logging and tracing tools (e.g. Prometheus Grafana ELK Datadog etc.).
  • Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform CloudFormation Pulumi).
  • Create and maintain CI/CD pipelines (GitHub Actions GitLab CI Jenkins ArgoCD etc.) to support fast and safe delivery.
  • Lead incident response root cause analysis and postmortems to ensure high uptime and rapid recovery.
  • Optimize system performance reliability and cost-effectiveness through proactive monitoring and tuning.
  • Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.
  • Implement and maintain security best practices across environments (e.g. secrets management IAM firewalls etc.).
  • Maintain disaster recovery plans backups and high-availability strategies.

Qualifications: Required:

  • 5 years of experience as an SRE DevOps Engineer or similar role.
  • Proficiency in scripting and automation (Bash Python Go etc.).
  • Strong experience with containerization and orchestration (Docker Kubernetes Helm).
  • Solid understanding of Linux systems administration and networking fundamentals.
  • Experience with cloud platforms (AWS Azure or GCP).
  • Experience with IaC tools like Terraform or CloudFormation.
  • Familiarity with GitOps and modern deployment practices.
  • Hands-on experience with observability tools (e.g. Prometheus Grafana Datadog).
  • Strong troubleshooting and incident response skills.

Preferred:

  • Experience in a high-traffic microservices-based architecture.
  • Exposure to service meshes (Istio Linkerd).
  • Certifications (AWS Certified DevOps Engineer CKA etc.)
  • Experience with security automation and compliance (e.g. SOC2 ISO27001).

Soft Skills:

  • Strong communication and collaboration abilities.
  • Ability to thrive in a fast-paced agile environment.
  • Analytical mindset and proactive approach to problem-solving.
  • A passion for automation performance and system design.

Skills

Azure Prometheus Terraform

$60/hr CAD Glider MUST Job Summary: We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build scale and maintain our infrastructure and services. You will play a critical role in ensuring high availability performance scalability and security ...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting