Site Reliability Engineer Cloud, Dynatrace

Toronto - Canada

Monthly Salary: CAD 10 - 10

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Essential Skills

Site Reliability Engineer (SRE)

Amazon Web Service (AWS) Cloud Computing

Github Enterprise

Role Descriptions

Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring log investigation and observability practices.

The ideal candidate will have a deep understanding of business processes upstream-downstream dependencies and the ability to design and implement dashboards with SLOs and SLAs that align with business objectives.

Key Responsibilities

Monitoring Observability

Configure and maintain Dynatrace for application and infrastructure monitoring.

Develop custom dashboards alerts and reports to track system health and performance.

Define and implement Service Level Objectives (SLOs) and Service Level Agreements (SLAs).

Log Analysis

Troubleshooting

Perform log investigation using tools like Splunk ELK or similar platforms.

Identify root causes of incidents and provide actionable insights for resolution.

Business Understanding.

Analyze business models workflows and critical application flows.

Map upstream and downstream dependencies to ensure end-to-end reliability.

Incident Management

Participate in on-call rotations and respond to production incidents.

Drive post-incident reviews and implement preventive measures.

Automation Optimization

Automate monitoring and alerting processes to reduce manual intervention.

Collaborate with development teams to improve system reliability and performance.

Required Skills and Qualifications

Technical Expertise

Strong experience with Dynatrace (configuration dashboards and problem detection).

Proficiency in log analysis tools (Splunk ELK or equivalent).

Solid understanding of SRE principles observability and incident management.

Business Analytical Skills

Ability to understand business processes and translate them into technical monitoring solutions.

Experience in mapping application dependencies and creating impact analysis.

Soft Skills

Excellent communication and collaboration skills.

Strong problem-solving and analytical mindset.

Preferred Experience

Experience with Cloud platforms (AWS Azure GCP).

Familiarity with CI/CD pipelines and automation scripting.

Performance Metrics Uptime and reliability improvements.

Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).

Accuracy and relevance of dashboards and alerts.

Compliance with defined SLOs and SLAs.

Experience Required: 8-10 years

Required Skills:

Essential Skills Site Reliability Engineer (SRE) Amazon Web Service (AWS) Cloud Computing Github Enterprise Role Descriptions Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring log investigation and observability practices. The ideal candidate will have a deep understanding of business processes upstream-downstream dependencies and the ability to design and implement dashboards with SLOs and SLAs that align with business objectives. Key Responsibilities Monitoring Observability Configure and maintain Dynatrace for application and infrastructure monitoring. Develop custom dashboards alerts and reports to track system health and performance. Define and implement Service Level Objectives (SLOs) and Service Level Agreements (SLAs). Log Analysis Troubleshooting Perform log investigation using tools like Splunk ELK or similar platforms. Identify root causes of incidents and provide actionable insights for resolution. Business Understanding. Analyze business models workflows and critical application flows. Map upstream and downstream dependencies to ensure end-to-end reliability. Incident Management Participate in on-call rotations and respond to production incidents. Drive post-incident reviews and implement preventive measures. Automation Optimization Automate monitoring and alerting processes to reduce manual intervention. Collaborate with development teams to improve system reliability and performance. Required Skills and Qualifications Technical Expertise Strong experience with Dynatrace (configuration dashboards and problem detection). Proficiency in log analysis tools (Splunk ELK or equivalent). Solid understanding of SRE principles observability and incident management. Business Analytical Skills Ability to understand business processes and translate them into technical monitoring solutions. Experience in mapping application dependencies and creating impact analysis. Soft Skills Excellent communication and collaboration skills. Strong problem-solving and analytical mindset. Preferred Experience Experience with Cloud platforms (AWS Azure GCP). Familiarity with CI/CD pipelines and automation scripting. Performance Metrics Uptime and reliability improvements. Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Accuracy and relevance of dashboards and alerts. Compliance with defined SLOs and SLAs. Experience Required: 8-10 years