$60/hr CAD
Glider MUST
Job Summary:
We are seeking an experienced Site Reliability Engineer (SRE) with advanced DevOps expertise to help build scale and maintain our infrastructure and services.
You will play a critical role in ensuring high availability performance scalability and security of our production systems while enabling continuous deployment and rapid delivery of features to our customers.
Key Responsibilities:
- Design build and maintain reliable scalable and secure cloud-based infrastructure (AWS Azure or GCP).
- Develop and improve observability using monitoring ing logging and tracing tools (e.g. Prometheus Grafana ELK Datadog etc.).
- Automate repetitive tasks and infrastructure using Infrastructure-as-Code (Terraform CloudFormation Pulumi).
- Create and maintain CI/CD pipelines (GitHub Actions GitLab CI Jenkins ArgoCD etc.) to support fast and safe delivery.
- Lead incident response root cause analysis and postmortems to ensure high uptime and rapid recovery.
- Optimize system performance reliability and cost-effectiveness through proactive monitoring and tuning.
- Collaborate with software engineering teams to define SLAs/SLOs and improve service reliability.
- Implement and maintain security best practices across environments (e.g. secrets management IAM firewalls etc.).
- Maintain disaster recovery plans backups and high-availability strategies.
Qualifications: Required:
- 5 years of experience as an SRE DevOps Engineer or similar role.
- Proficiency in scripting and automation (Bash Python Go etc.).
- Strong experience with containerization and orchestration (Docker Kubernetes Helm).
- Solid understanding of Linux systems administration and networking fundamentals.
- Experience with cloud platforms (AWS Azure or GCP).
- Experience with IaC tools like Terraform or CloudFormation.
- Familiarity with GitOps and modern deployment practices.
- Hands-on experience with observability tools (e.g. Prometheus Grafana Datadog).
- Strong troubleshooting and incident response skills.
Preferred:
- Experience in a high-traffic microservices-based architecture.
- Exposure to service meshes (Istio Linkerd).
- Certifications (AWS Certified DevOps Engineer CKA etc.)
- Experience with security automation and compliance (e.g. SOC2 ISO27001).
Soft Skills:
- Strong communication and collaboration abilities.
- Ability to thrive in a fast-paced agile environment.
- Analytical mindset and proactive approach to problem-solving.
- A passion for automation performance and system design.
Skills
Azure Prometheus Terraform