Staff Site Reliability & DevOps Engineer Observability
Job Summary
At Cision we believe in empowering every individual to make an impact. Here your voice is heard your ideas are valued and your unique perspective fuels our collective success. As part of our global team youll thrive in an environment that champions curiosity collaboration and innovation all while making meaningful contributions to the brands we accelerate.
Join us in shaping the future of communication and building authentic connections that matter. Whether youre solving complex problems or driving bold innovations your growth is our success and together well create the conversations of tomorrow.
Empower your impact at Cision. Be seen be understood be you.
This role focuses on designing operating and evolving observability platforms with a strong emphasis on metrics logging and alerting. The primary tooling is Grafana and Prometheus with responsibility for ensuring production systems are observable reliable and operable at scale. The role works closely with platform infrastructure and application teams.
Key responsibilities:
Design build and operate observability platforms based on Grafana and Prometheus
Define and maintain metrics standards dashboards alerts and SLOs
Improve signal quality: reduce alert noise tune thresholds and improve runbooks
Support incident response by providing actionable telemetry and post-incident analysis
Integrate metrics logs and traces across distributed systems
Work with engineering teams to instrument services correctly
Automate observability configuration using infrastructure as code
Contribute to reliability improvements through capacity planning and performance analysis
Required skills and experience
Strong experience with Prometheus (scraping federation recording rules alerting)
Strong experience with Grafana (dashboards alerting templating RBAC)
Solid Linux and networking fundamentals
Experience running observability stacks in Kubernetes environments
Infrastructure as code experience (Terraform preferred)
Familiarity with incident management and on-call practices
Ability to debug production systems using metrics and logs
Design build and operate observability platforms based on Grafana and Prometheus
Define and maintain metrics standards dashboards alerts and SLOs
Improve signal quality: reduce alert noise tune thresholds and improve runbooks
Support incident response by providing actionable telemetry and post-incident analysis
Integrate metrics logs and traces across distributed systems
Work with engineering teams to instrument services correctly
Automate observability configuration using infrastructure as code
Contribute to reliability improvements through capacity planning and performance analysis
Required skills and experience
Strong experience with Prometheus (scraping federation recording rules alerting)
Strong experience with Grafana (dashboards alerting templating RBAC)
Solid Linux and networking fundamentals
Experience running observability stacks in Kubernetes environments
Infrastructure as code experience (Terraform preferred)
Familiarity with incident management and on-call practices
Ability to debug production systems using metrics and logs
Nice to have:
Experience with logs and traces (e.g. Loki Tempo OpenTelemetry)
Experience operating large-scale or multi-cluster Kubernetes platforms
Experience with cloud platforms (GCP AWS OCI)
Exposure to SRE concepts such as error budgets and SLO-driven prioritisation
What success looks like
Engineers trust dashboards and alerts to reflect system health
Incidents are detected earlier and diagnosed faster
Alert fatigue is reduced and on-call quality improves
Observability is treated as a first-class platform capabilit
As a global leader in PR marketing and social media management technology and intelligence Cision helps brands and organizations to identify connect and engage with customers and stakeholders to drive business Newswire a network of over 1.1 billion influencers in-depth monitoring analytics and media platforms headline a premier suite of solutions. Cision has offices in 24 countries throughout the Americas EMEA and APAC. For more information about Cisions award-winning solutions including its next-gen Cision Communications Cloud follow @Cision on Twitter.
Cision is committed to fostering an inclusive environment where all employees can be their authentic selves and perform at their best. We believe diversity equity and inclusion is vital to driving our culture sparking innovation and achieving long-term success. Cision is proud to have joined more than 600 companies in signing theCEO Action for Diversity & Inclusion pledgeand named a Top Diversity Employer for 2021 .
Cision is proud to be an equal opportunity employer seeking to create a welcoming and diverse environment. All qualified applicants will receive consideration for employment without regard to race color religion sex gender identity or expression sexual orientation national origin genetics disability age veteran status or other protected statuses.
Cision is committed to the full inclusion of all qualified keeping with our commitment Cision will take the steps to assure that people with disabilities are provided reasonable accommodations. Accordingly if reasonable accommodation is required to fully participate in the job application or interview process to perform the essential functions of the position and/or to receive all other benefits and privileges of employment please contact
Please review ourGlobal Candidate Data Privacy Statement to learn about Cisions commitment to protecting personal data collected during the hiring process.
Required Experience:
Staff IC