drjobs Site Reliability Engineer - Observability

Site Reliability Engineer - Observability

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Berlin - Germany

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

About Wolt

At Wolt we create technology that brings joy simplicity and earnings to the neighborhoods of the 2014 we started with delivery of restaurant food. Now were building the delivery of (almost) everything and youll find us in over 500 cities in 30 countries around the 2022 we joined forces with DoorDash and together we keep on dreaming big and expanding across the globe.

Working at Wolt isnt always easy but its definitely exciting. Here youll learn more build more and ship more than in most other companies. Youll be challenged a lot but also have a lot of fun on the way. So if youre a self-starter with drive and entrepreneurial spirit this could be the ride of your life.

Wolt powered by a cutting-edge platform managed by specialized teams within our Core Group.
One of these teams is theObservability Engineering Team dedicated to ensuring visibility reliability and performance across Wolts services and infrastructure at scale.

As a Software Engineer in Observability youll play a key role in developing scalable observability and reliability tooling maintaining performance test frameworks and improving Wolts overall system health. This role is ideal for engineers with a strong foundation in software development and exposure to Site Reliability Engineering (SRE) practices.

Our team manages an Observability platform that processes billions of metrics traces and log entries monthly supporting all Wolt engineers in monitoring and improving the health of their services. We maintain a robust ecosystem covering application instrumentation telemetry data collection visualization and alerting. Additionally we are collaborating with DoorDash to build the next-generation observability platformdesigned to enhance visibility scalability and operational efficiency across both organizations. If youre passionate about building reliable systems at scale wed love to hear from you!

What youll do :

  • Design and develop scalable software solutions and tooling to improve observability and reliability across Wolts services with a focus on empowering teams to monitor and debug effectively.
  • Contribute to initiatives focused on architecting building and maintaining observability stack to efficiently handle increasing telemetry data with greater reliability.
  • Take ownership of key initiatives to improve the quality efficiency and reliability of our observability stack.
  • Contribute to and advocate for SRE principles to improve system availability performance and efficiency ensuring that reliability is embedded across all layers of Wolts services.
  • Build and own tooling and frameworks that enable teams to improve reliability optimize system performance and manage incidents more effectively.
  • Collaborate closely with engineering teams to implement observability best practices integrate reliability tooling and resolve complex production issues.
  • Participating in on-call rotations driving root cause analysis and building automated detection and resolution tools to reduce mean time to recovery (MTTR) in purview of observability domain and systems.
  • Document and share knowledge through guides playbooks and training sessions while continuously improving the developer experience with self-service tooling and best practices.

Qualifications:

  • Strong foundation in software engineering with experience designing and building distributed systems.
  • Proficiency in Go (preferred) or Python with a focus on building automation and developer tooling.
  • Experience with operating and understanding observability platforms at scale
  • Hands-on experience in architecting and maintaining observability stack at scale based on open source tools such as Prometheus Grafana Elasticsearch or similar.
  • Solid understanding of SRE principles including incident response fault-tolerant architecture and service-level objectives (SLIs/SLOs).
  • Be comfortable working in large-scale distributed environments with expertise in Kubernetes container orchestration and resolving issues in complex cloud-native systems.
  • Familiarity with cloud platforms (AWS preferred GCP or Azure)
  • Strong troubleshooting and problem-solving skills in complex systems.
  • Excellent collaboration and communication skills.

Nice to Haves:

  • Hands-on experience with OpenTelemetry and modern observability frameworks at scale.
  • Experience with large-scale distributed databases and / or event streaming platforms such as Kafka ClickHouse.
  • Contributions to open-source projects in observability or platform engineering esp to CNCF.

This role can be based in one of our tech hubs in Helsinki Berlin or Stockholm or you can work remotely anywhere in Finland Sweden Germany Denmark and Estonia. Read more about our remote setup here. If you live outside of these countries - not to worry! We provide relocation support to help you make your way to Finland Germany or Sweden.

The position will be filled as soon as we find the right people so feel free to apply as soon as you feel like hearing more about the position and potentially joining Wolt & Doordash!

Employment Type

Full Time

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.