Role Overview
We are seeking a highly skilled Senior Observability Engineer to help with the optimization and standardization of our Grafana Cloud ecosystem. This role is critical for reducing operational expenditure through efficient platform configuration and establishing the observability framework for the TSA separation programme.
The ideal candidate is a subject matter expert in the Grafana LGTM stack (Loki Grafana Tempo Mimir/Prometheus) with a proven track record of implementing cost-effective high-performance monitoring solutions.
Key Responsibilities
1. Grafana Cloud Optimization (Cost & Performance)
- Cost Optimization : Implement strategies to decrease monthly expenditure including query optimization refining data retention policies and eliminating redundant data ingestion.
- Dashboard & Alerting: Enhance monitoring quality by refining existing alerts and creating high-impact insightful dashboards that improve system stability.
2. Strategy & Best Practices
- Standardization: Develop and document organization-wide best practices for configuring Grafana Cloud Prometheus and Loki.
- Governance & Security: Implement robust Role-Based Access Controls (RBAC) to mitigate security vulnerabilities and prevent unauthorized access to sensitive log data.
- Migration Roadmap: Establish foundational observability guidelines to ensure the TSA separation is launched with consistent and effective monitoring.
3. Platform Architecture & Collaboration
- Foundation Project: Collaborate with the Osttra Platform teams to define core observability components including logging metrics and tracing standards.
- Architectural Design: Contribute to the design of scalable observability solutions that will be integrated into the core platform architecture.
- Knowledge Transfer: Mentor internal teams to foster long-term observability expertise and ensure the sustainability of the new standards.
Technical Qualifications (L3 Requirements)
- Expert-level Grafana Cloud: Extensive experience managing Grafana Cloud at scale specifically focusing on cost management and performance tuning.
- Observability Stack: Deep technical proficiency in Prometheus (metrics) Loki (logging) and Tempo (tracing).
- Data Strategy: Proven ability to manage complex data ingestion pipelines and optimize cardinality to reduce cloud costs.
- Security Mindset: Practical experience implementing secure access controls and compliance standards within observability platforms.
- Infrastructure as Code: Experience defining observability components as code to support automated platform foundations.
Role Overview We are seeking a highly skilled Senior Observability Engineer to help with the optimization and standardization of our Grafana Cloud ecosystem. This role is critical for reducing operational expenditure through efficient platform configuration and establishing the observability fra...
Role Overview
We are seeking a highly skilled Senior Observability Engineer to help with the optimization and standardization of our Grafana Cloud ecosystem. This role is critical for reducing operational expenditure through efficient platform configuration and establishing the observability framework for the TSA separation programme.
The ideal candidate is a subject matter expert in the Grafana LGTM stack (Loki Grafana Tempo Mimir/Prometheus) with a proven track record of implementing cost-effective high-performance monitoring solutions.
Key Responsibilities
1. Grafana Cloud Optimization (Cost & Performance)
- Cost Optimization : Implement strategies to decrease monthly expenditure including query optimization refining data retention policies and eliminating redundant data ingestion.
- Dashboard & Alerting: Enhance monitoring quality by refining existing alerts and creating high-impact insightful dashboards that improve system stability.
2. Strategy & Best Practices
- Standardization: Develop and document organization-wide best practices for configuring Grafana Cloud Prometheus and Loki.
- Governance & Security: Implement robust Role-Based Access Controls (RBAC) to mitigate security vulnerabilities and prevent unauthorized access to sensitive log data.
- Migration Roadmap: Establish foundational observability guidelines to ensure the TSA separation is launched with consistent and effective monitoring.
3. Platform Architecture & Collaboration
- Foundation Project: Collaborate with the Osttra Platform teams to define core observability components including logging metrics and tracing standards.
- Architectural Design: Contribute to the design of scalable observability solutions that will be integrated into the core platform architecture.
- Knowledge Transfer: Mentor internal teams to foster long-term observability expertise and ensure the sustainability of the new standards.
Technical Qualifications (L3 Requirements)
- Expert-level Grafana Cloud: Extensive experience managing Grafana Cloud at scale specifically focusing on cost management and performance tuning.
- Observability Stack: Deep technical proficiency in Prometheus (metrics) Loki (logging) and Tempo (tracing).
- Data Strategy: Proven ability to manage complex data ingestion pipelines and optimize cardinality to reduce cloud costs.
- Security Mindset: Practical experience implementing secure access controls and compliance standards within observability platforms.
- Infrastructure as Code: Experience defining observability components as code to support automated platform foundations.
View more
View less