Site Reliability Engineer

Deepset

Not Interested
Bookmark
Report This Job

profile Job Location:

Berlin - Germany

profile Monthly Salary: Not Disclosed
Posted on: 4 hours ago
Vacancies: 1 Vacancy

Job Summary

TL;DR

Were hiring a Site Reliability Engineer to own and evolve deepsets cloud and customer infrastructure end to end. Youll work across SaaS private cloud and on-prem environments to make our self-hosted platform production-ready drive CI/CD and GitOps maturity and reduce complexity at scale. Your work will directly shape how deepsets AI platform is built deployed and scaled for our own cloud and for customers running it in their own environments.

What you will do

You wont just keep things running - youll help define how our platform is built deployed and scaled across cloud and customer environments.
  • Build and operate real-world infrastructure.Design configure and evolve infrastructure that runs both in our cloud and inside customer environments (SaaS private cloud on-prem).
  • Make self-hosted production-ready.Help us deliver a production-grade self-hosted platform that can be deployed on any Kubernetes setup in weeks - not months.
  • Drive automation & platform maturity.Improve CI/CD pipelines GitHub workflows and GitOps setups so teams can ship faster with confidence.
  • Reduce complexity and cost.Continuously simplify systems and optimize infrastructure spend without compromising performance or reliability.
  • Shape how we build.Champion best practices in reliability scalability and security across the organization not as rules but as working systems.

Requirements

  • 2-5 years of experience working with large-scale production infrastructure
  • Fluent German language skills
  • Experience with distributed or service-oriented architectures
  • Hands-on expertise with:
    • AWS
    • Kubernetes
    • CI/CD and GitOps (e.g. ArgoCD)
  • Working knowledge of Infrastructure as Code (Terraform preferred)
  • Solid troubleshooting skills - you can debug across systems not just within one layer
  • A pragmatic mindset: you balance speed simplicity and reliability
  • Ownership and accountability - you take responsibility for systems end-to-end
  • Ability to work independently while staying aligned with the teams goals

Nice to have

  • Familiarity with observability stacks (e.g. Datadog Prometheus)
  • Experience optimizing cloud costs at scale
  • Interest or experience in Machine Learning / LLM systems
  • Experience improving developer experience and platform tooling using AI agents
  • Contributions to SRE practices like postmortems SLIs/SLOs and reliability engineering culture

Benefits

  • Remote-first setup with flexible hours & tech of your choice
  • 30 days vacation extra days for family sick leave
  • Competitive salary & stock options for every team member
  • Monthly sports & mental health support allowance with Oliva
  • Annual learning & development budget
  • Monthly team socials & in-person meetups
  • Dog-friendly Berlin HQ

About us

Founded in 2018 deepset builds open and enterprise-grade tools that help teams build AI with purpose. From Haystack our open-source framework to the Haystack Enterprise Platform we give developers and organizations the building blocks to solve complex high impact challenges with AI with full control transparency and sovereignty. Backed by GV and Balderton were growing the worlds production AI community and customer base solving challenges too critical to get wrong.

Visit us to learn more:deepset WebsiteHaystack WebsiteGitHubLinkedinX deepset (Twitter)X haystack (Twitter)

Required Experience:

IC

TL;DRWere hiring a Site Reliability Engineer to own and evolve deepsets cloud and customer infrastructure end to end. Youll work across SaaS private cloud and on-prem environments to make our self-hosted platform production-ready drive CI/CD and GitOps maturity and reduce complexity at scale. Your w...
View more view more

About Company

Company Logo

Our vision: We make machines understand language so that humans can achieve more. Our mission: To make custom AI solutions accessible to every organization, driving adoption and impact. By combining innovation with expertise, we simplify the complexity of LLM agent and application dev ... View more

View Profile View Profile