Role: Site Reliability Engineer
Location: Montreal QC (Onsite)
Type: Contract
Job Responsibilities:
- Building and maintaining knowledge front to back of development environment.
- Maximizing the availability and performance of supported systems through optimized and automated plant management ongoing problem management and architecture reviews with dev-side peers.
- Reduction of the cost of support (hours of effort) through the elimination of operational issues optimization and automation of tasks development of operational tools and driving client self-service to minimize constraints.
- Identification and prioritization of technical debt that is impacting client (i.e. software developers) productivity system reliability or the efficiency of the Operations Team.
- Collaboration with other SREs to share solutions.
- Complex troubleshooting of front to back development environment issues.
- Maximize Ops team product knowledge and support capabilities to minimize the escalation rate to the departments feature engineers/developers.
- Consulting with the clients to maximize productivity including troubleshooting their issues with solutions.
- Experimentation with new tools and techniques.
- Being operationally responsive including sharing on-call rotation with the rest of the global time (time-off in lieu system).
Required Qualifications & Skills:
- Strong Linux troubleshooting skills
- Task automation experience in any programming language preferably Python
- Practical experience of implementing monitoring / observability solutions using Prometheus and Grafana
- Experience with using version control (Bitbucket Github) issue tracking (Jira) continuous integration (Jenkins Azure DevOps) automated testing or deployment automation.
- Excellent communication skills to work with peers and third-party vendors.
- Confident collaboration skills
Desired Skills:
- Experience with site reliability engineering practices like service level objectives (SLOs) error budgets blameless postmortems toil reduction
- Experience with Docker / Kubernetes