Essential Skills
Site Reliability Engineer (SRE)
Amazon Web Service (AWS) Cloud Computing
Role Descriptions
Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring log investigation and observability practices.
The ideal candidate will have a deep understanding of business processes upstream-downstream dependencies and the ability to design and implement dashboards with SLOs and SLAs that align with business objectives.
Key Responsibilities
Monitoring Observability
Configure and maintain Dynatrace for application and infrastructure monitoring.
Develop custom dashboards alerts and reports to track system health and performance.
Define and implement Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
Log Analysis
Troubleshooting
Perform log investigation using tools like Splunk ELK or similar platforms.
Identify root causes of incidents and provide actionable insights for resolution.
Business Understanding.
Analyze business models workflows and critical application flows.
Map upstream and downstream dependencies to ensure end-to-end reliability.
Incident Management
Participate in on-call rotations and respond to production incidents.
Drive post-incident reviews and implement preventive measures.
Automation Optimization
Automate monitoring and alerting processes to reduce manual intervention.
Collaborate with development teams to improve system reliability and performance.
Required Skills and Qualifications
Technical Expertise
Strong experience with Dynatrace (configuration dashboards and problem detection).
Proficiency in log analysis tools (Splunk ELK or equivalent).
Solid understanding of SRE principles observability and incident management.
Business Analytical Skills
Ability to understand business processes and translate them into technical monitoring solutions.
Experience in mapping application dependencies and creating impact analysis.
Soft Skills
Excellent communication and collaboration skills.
Strong problem-solving and analytical mindset.
Preferred Experience
Experience with Cloud platforms (AWS Azure GCP).
Familiarity with CI/CD pipelines and automation scripting.
Performance Metrics Uptime and reliability improvements.
Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).
Accuracy and relevance of dashboards and alerts.
Experience Required: 8-10 years
Required Skills:
Essential Skills Site Reliability Engineer (SRE) Amazon Web Service (AWS) Cloud Computing Github Enterprise Role Descriptions Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring log investigation and observability practices. The ideal candidate will have a deep understanding of business processes upstream-downstream dependencies and the ability to design and implement dashboards with SLOs and SLAs that align with business objectives. Key Responsibilities Monitoring Observability Configure and maintain Dynatrace for application and infrastructure monitoring. Develop custom dashboards alerts and reports to track system health and performance. Define and implement Service Level Objectives (SLOs) and Service Level Agreements (SLAs). Log Analysis Troubleshooting Perform log investigation using tools like Splunk ELK or similar platforms. Identify root causes of incidents and provide actionable insights for resolution. Business Understanding. Analyze business models workflows and critical application flows. Map upstream and downstream dependencies to ensure end-to-end reliability. Incident Management Participate in on-call rotations and respond to production incidents. Drive post-incident reviews and implement preventive measures. Automation Optimization Automate monitoring and alerting processes to reduce manual intervention. Collaborate with development teams to improve system reliability and performance. Required Skills and Qualifications Technical Expertise Strong experience with Dynatrace (configuration dashboards and problem detection). Proficiency in log analysis tools (Splunk ELK or equivalent). Solid understanding of SRE principles observability and incident management. Business Analytical Skills Ability to understand business processes and translate them into technical monitoring solutions. Experience in mapping application dependencies and creating impact analysis. Soft Skills Excellent communication and collaboration skills. Strong problem-solving and analytical mindset. Preferred Experience Experience with Cloud platforms (AWS Azure GCP). Familiarity with CI/CD pipelines and automation scripting. Performance Metrics Uptime and reliability improvements. Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Accuracy and relevance of dashboards and alerts. Compliance with defined SLOs and SLAs. Experience Required: 8-10 years
IT Services and IT Consulting