Site Reliability Engineer Dynatrace, Splunk, Cloud Platform

Not Interested
Bookmark
Report This Job

profile Job Location:

Toronto - Canada

profile Monthly Salary: Not Disclosed
profile Experience Required: 5years
Posted on: 5 hours ago
Vacancies: 1 Vacancy

Job Summary

Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring log investigation and observability practices. The ideal candidate will have a deep understanding of business processes upstream-downstream dependencies and the ability to design and implement dashboards with SLOs and SLAs that align with business objec-tives.

Key Responsibilities

Monitoring Observability oConfigure and maintain Dynatrace for application and infrastructure monitoring. Develop custom dashboards alerts and reports to track system health and performance. Define and implement Service Level Objectives (SLOs) and Service Level Agreements (SLAs).

Log Analysis Troubleshooting Perform log investigation using tools like Splunk ELK or similar platforms. Identify root causes of incidents and provide actionable insights for resolution.

Business Under-standing oAnalyze business models workflows and critical application flows. Map up-stream and downstream dependencies to ensure end-to-end reliability.

Incident Man-agement Participate in on-call rotations and respond to production incidents. Drive post-incident reviews and implement preventive measures.

Automation Optimization Automated monitoring and alerting processes to reduce manual intervention. Collabo-rate with development teams to improve system reliability and performance.

Required Skills Qualifications

Technical Expertise Strong experience with Dynatrace (configura-tion dashboards problem detection). Proficiency in log analysis tools (Splunk ELK or equivalent). Solid understanding of SRE principles observability and incident man-agement.

Business Analytical Skills Ability to understand business processes and translate them into technical monitoring solutions. Experience in mapping application dependencies and creating impact analysis.

Soft Skills Excellent communication and collaboration skills. Strong problem-solving and analytical mind-set.

Preferred oExperience with cloud platforms (AWS Azure GCP). Familiarity with CICD pipelines and automation scripting.

Performance Metrics Uptime and reliability improvements. Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Accuracy and relevance of dashboards and alerts. Compliance with defined SLOs and SLAs.

Experience required: 10


Required Skills:

BASEL

Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring log investigation and observability practices. The ideal candidate will have a deep understanding of business processes upstream-downstream dependencies and the ability to design and implement dashboards with SLOs and SLAs that a...
View more view more

Company Industry

IT Services and IT Consulting

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting