Key Responsibilities:
- Lead the design configuration deployment and administration of Grafana in large-scale production environments.
- Develop and maintain advanced user-friendly dashboards using PromQL LogQL and other query languages.
- Integrate Grafana with a variety of data sources including Prometheus Loki CloudWatch Elasticsearch and InfluxDB.
- Establish monitoring standards governance and best practices across multiple business units and regions.
- Manage role-based access organizations and plugin configurations in Grafana Enterprise or Grafana Cloud.
- Collaborate with cross-functional teams (DevOps SRE Cloud Security) to improve observability incident response and performance monitoring.
- Ensure uptime availability and scalability of monitoring tools.
- Automate infrastructure and dashboard deployments using CI/CD pipelines Terraform Ansible or GitOps workflows.
- Troubleshoot and optimize existing monitoring setups to reduce noise and increase actionable alerts.
- Maintain security and compliance standards across monitoring systems.
Required Skills & Qualifications:
- 8 years of hands-on experience in Grafana administration with proven success in complex and large-scale environments.
- Strong background in Linux system administration (RHEL Ubuntu CentOS).
- Solid experience working with AWS cloud infrastructure (EC2 CloudWatch IAM S3 Lambda etc.).
- Deep understanding of observability concepts metrics logs traces.
- Proficiency in scripting (Bash Python etc.) and automation.
- Familiarity with container orchestration platforms like Kubernetes.
- Experience working in Agile and DevOps cultures.
- Strong analytical and communication skills with the ability to explain technical details to non-technical stakeholders.
-