Engineering Manager, SRE Observability

Zendesk

Not Interested
Bookmark
Report This Job

profile Job Location:

Kraków - Poland

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Job Description

As an Engineering Manager specializing in Observability you will lead and scale a highly skilled team responsible for architecting building and evolving enterprise-grade monitoring alerting and incident response systems. Leveraging your deep expertise with observability tools such as Datadog Grafana Loki and others you will drive our transformation from reactive firefighting to proactive reliability engineering at scale. Your mission is to empower engineering teams by providing the right visibility and tooling to ensure system health availability and performance.

You will collaborate closely with Product Management and Technical Leads to define and execute a strategic roadmap that addresses the challenges of monitoring complex large-scale distributed systems in a cloud-native environment. This role demands a hands-on engineering leader who understands the nuances of telemetry data visualization alerting reliability and cost-efficient observability architectures in enterprise settings.

What Youll Be Doing

  • Recruit mentor and retain top engineering talent specialized in observability and reliability engineering.

  • Directly contribute to the design and implementation of observability solutions alongside your team maintaining a high bar for technical excellence.

  • Own and evolve the end-to-end observability stack and operational processes including metrics traces logs dashboards and alerting.

  • Partner with SRE DevOps and platform teams to integrate and extend observability tooling across diverse services running at large scale.

  • Lead roadmap planning for observability infrastructure and tooling in partnership with Product and Engineering leadership.

  • Establish best practices for instrumentation data collection alerting thresholds and incident response workflows to elevate the organizations reliability posture.

  • Identify gaps and weaknesses in monitoring coverage and performance; proactively drive improvements and automation.

  • Collaborate cross-functionally with teams across the enterprise to influence observability adoption standardization and innovation.

  • Foster a culture of continuous learning high team engagement and technical craftsmanship within your team.

  • Communicate technical strategy progress risks and impact effectively with stakeholders at all levels.

What You Bring to the Role

  • Deep hands-on experience with commercial and open-source observability tools including Datadog Grafana Loki and related telemetry technologies.

  • Proven track record managing observability or SRE teams within large complex enterprise environments.

  • Strong understanding of distributed systems cloud-native architectures (Kubernetes AWS) and how observability fits into scalable operations.

  • Ability to provide technical leadership while actively contributing to engineering solutions and troubleshooting.

  • Expertise in designing scalable reliable telemetry pipelines and intelligent alerting to reduce alert noise and incident toil.

  • Demonstrated skill in building and improving observability platforms that serve multiple engineering teams and business units.

  • Effective communicator and collaborator able to bridge engineering product and business stakeholders.

  • Commitment to developing team members through coaching feedback and career growth opportunities.

  • Experience driving cultural change in organizations towards proactive reliability engineering and data-driven decision making.

  • 3 years of people management experience leading engineering teams.

  • Deep domains expertise in Observability with hands-on experience in tools like Datadog Grafana Loki etc.

  • Significant experience working in or managing engineering teams within large-scale enterprise companies.

  • Proven ability to hire mentor and retain high-performing engineers.

  • Strong collaboration skills to influence cross-functional teams in large engineering organizations.

  • Experience with distributed systems and cloud environments (AWS Kubernetes).

Preferred

  • Background leading Observability focused teams.

  • Hands-on experience operating telemetry systems for large-scale Kubernetes and AWS workloads.

  • Passion for innovation continuous learning and championing a growth mindset.

  • Experience managing geographically distributed teams.

Our Tech Environment

  • Primarily AWS cloud infrastructure with Kubernetes orchestration.

  • Codebase spans Ruby Go and Python.

  • Data storage includes AWS Aurora (MySQL) S3 and Kafka streaming.

  • Observability responsibilities include balancing operational maintenance tooling innovation and incident support.

#LI-KO1

The Poland annualized base salary range for this position is zł297000.00-zł445000.00. Please note that while the salary range represents the minimum and maximum base salary rate for this position the actual compensation offered will be based on job related capabilities applicable experience and other relevant factors. This position may also be eligible for bonus benefits or related incentives that will be communicated during the offer stage.

Hybrid: In this role our hybrid experience is designed at the team level to give you a rich onsite experience packed with connection collaboration learning and celebration - while also giving you flexibility to work remotely for part of the week. This role must attend our local office for part of the week. The specific in-office schedule is to be determined by the hiring manager.

The intelligent heart of customer experience

Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.

Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working enables us to purposefully come together in person at one of our many Zendesk offices around the world to connect collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.

As part of our commitment to fairness and transparency we inform all applicants that artificial intelligence (AI) or automated decision systems may be used to screen or evaluate applications for this position in accordance with Company guidelines and applicable law.

Zendesk is an equal opportunity employer and were proud of our ongoing efforts to foster global diversity equity & inclusion in the workplace. Individuals seeking employment and employees at Zendesk are considered without regard to race color religion national origin age sex gender gender identity gender expression sexual orientation marital status medical condition ancestry disability military or veteran status or any other characteristic protected by applicable law. We are an AA/EEO/Veterans/Disabled employer. If you are based in the United States and would like more information about your EEO rights under the law please click here.

Zendesk endeavors to make reasonable accommodations for applicants with disabilities and disabled veterans pursuant to applicable federal and state law. If you are an individual with a disability and require a reasonable accommodation to submit this application complete any pre-employment testing or otherwise participate in the employee selection process please send an e-mail to with your specific accommodation request.


Required Experience:

Manager

Job DescriptionAs an Engineering Manager specializing in Observability you will lead and scale a highly skilled team responsible for architecting building and evolving enterprise-grade monitoring alerting and incident response systems. Leveraging your deep expertise with observability tools such as ...
View more view more

Key Skills

  • Hospitality Experience
  • Go
  • Management Experience
  • React
  • Redux
  • Node.js
  • AWS
  • Mechanical Engineering
  • Team Management
  • Leadership Experience
  • Mentoring
  • Distributed Systems