We are seeking a highly skilled Site Reliability Engineer (SRE) with strong observability expertise proven communication skills and the ability to drive reliability maturity across multi-team environments. This role is ideal for someone who can blend deep technical proficiency with strategic thinking and collaborative influence.
Key Responsibilities
Observability Engineering
Design scale optimize and manage Prometheus and Grafana environments.
Write advanced PromQL queries dashboards visualizations and metric-based calculations.
Build out and maintain Grafana instances supporting multi-team use cases.
Leverage Dynatrace with strong proficiency in metrics and analytics to deliver efficient actionable observability solutions for engineering and operations teams (e.g. dashboards insights reports).
Analyze telemetry data to identify the metrics that matter (MTM) drive actionable insights and influence engineering decisions.
Site Reliability Engineering
Apply and evolve an SRE Maturity Model to help teams mature across observability resilience automation and reliability.
Establish implement and maintain Service Level Objectives (SLOs) and error budgets across applications and services.
Partner effectively with engineering product operations and leadership teams; translate complex technical insights into clear actionable communication.
Identify and reduce toil through automation tooling improvements and process refinement.
Support incident analysis reliability reviews and continuous improvement initiatives.
Required Skills & Experience
Familiarity with SRE principles maturity models and reliability roadmaps.
Demonstrated experience improving application reliability via data-driven decisions.
Hands-on experience with Prometheus Grafana PromQL.
Strong understanding of Dynatrace metric analysis observability practices.
We are seeking a highly skilled Site Reliability Engineer (SRE) with strong observability expertise proven communication skills and the ability to drive reliability maturity across multi-team environments. This role is ideal for someone who can blend deep technical proficiency with strategic thinkin...
We are seeking a highly skilled Site Reliability Engineer (SRE) with strong observability expertise proven communication skills and the ability to drive reliability maturity across multi-team environments. This role is ideal for someone who can blend deep technical proficiency with strategic thinking and collaborative influence.
Key Responsibilities
Observability Engineering
Design scale optimize and manage Prometheus and Grafana environments.
Write advanced PromQL queries dashboards visualizations and metric-based calculations.
Build out and maintain Grafana instances supporting multi-team use cases.
Leverage Dynatrace with strong proficiency in metrics and analytics to deliver efficient actionable observability solutions for engineering and operations teams (e.g. dashboards insights reports).
Analyze telemetry data to identify the metrics that matter (MTM) drive actionable insights and influence engineering decisions.
Site Reliability Engineering
Apply and evolve an SRE Maturity Model to help teams mature across observability resilience automation and reliability.
Establish implement and maintain Service Level Objectives (SLOs) and error budgets across applications and services.
Partner effectively with engineering product operations and leadership teams; translate complex technical insights into clear actionable communication.
Identify and reduce toil through automation tooling improvements and process refinement.
Support incident analysis reliability reviews and continuous improvement initiatives.
Required Skills & Experience
Familiarity with SRE principles maturity models and reliability roadmaps.
Demonstrated experience improving application reliability via data-driven decisions.
Hands-on experience with Prometheus Grafana PromQL.
Strong understanding of Dynatrace metric analysis observability practices.
View more
View less