Senior Site Reliability Engineer
Job Summary
Yourea seasoned Site Reliability Engineer with years spent running production Kubernetes at scale andyourethe kind of engineer who takes the initiative when something can be better observability resilience a tricky upgrade or the way the team thinks about for a role where that initiative has room to turn into real improvements on a platform that customers trust with their most confidential data.
In this roleyoulljoin our operations team for ourMeetingSuiteproduct in Munich a flat and diverse SRE team of four team where influence comes from example rather than authority. Your day-to-day is keeping our Kubernetes platforms observable resilient and boring-to-upgrade:GitOpswith Flux multi-AZ design zero-downtime releases and a centralised observability story every service owner can use without calling SRE. Alongside thatyoullpartner closely with our Application Security Engineer on Kubernetes and container security with room to grow into our security champion over time to keep the bar high for the DAX 30 and other DACH customers we serve.
If multi-cluster KubernetesGitOps logging monitoring and NoSQL database management on Kubernetes are in your vocabulary read on.
Heresa breakdown of whatyoulldo (not all of it just the important stuff):
- Operate and continuously improve our Kubernetes production platforms contributing tozero-downtime upgradesandmulti-AZ resilienceas team-wide goals.
- Grow into the teams expert on our ELK-based log platform centralised cross-cluster monitoring and anomaly detection so every service owner can see alert on and debug their workload without SREhand-holding. Maintain and evolve our Prometheus alerting rules and Grafana dashboards alongside the team.
- Partner with our Application Security Engineer on Kubernetes and container security admission control workload identity secrets management networksegmentationand runtime threat detection with an interest in growing into our security champion over time.
- Love automation. Chip away at operational toil deployments monitoring setup internal reporting building on the baseline the team already has and ship reliably through ourGitOpsworkflow (Flux GitLab CI).
- Participate in our Standby and Daily Business rotation lead incident response run blameless post-mortems and drive the resulting action items to completion.
These are the essentialsyoullneed to get an interview:
- Several years hands-on SRE DevOps or Platform Engineering including meaningful time running production Kubernetes at scale.
- Strong Kubernetesexpertisewith deep hands-on experience in at least one area cluster lifecycle and upgrades workload identity and RBAC admission control network policies or custom resources and operators and working familiarity with the rest.
- Solid grasp of Kubernetes and container security secrets management networksegmentationand runtime protection and an interest in growing into our security champion alongside our Application Security Engineer.
- Proven depth in the ELK stack (ora very similarlog platform) pipelines indexing dashboards alerting with an interest in growing into the teams observability expert. Working knowledge of Prometheus and Grafana.
- Comfortable withGitOpsand CI/CD as a daily way of working (we run Flux and GitLab CI; equivalents like Argo CD GitHub Actions or Jenkins are fine) and hands-on experience with Helm andKustomizefor managing manifests. Solid coding in Go Python or Bash with a love for automating away repetitive work.
- Comfortable being on-call and leading incidents calmly under pressure.
- Professional fluency in Germanand excellent English; at home working in a diverse team.
It would be great if you had these to butwellsupport you if youdont:
- Experience in regulated industries (financial services legal healthcare defence) or under compliance frameworks such as ISO 27001 or C5.
- Track recordof designing or contributing to custom Kubernetes Operators.
- Service-mesh experience (IstioLinkerd Cilium).
- A demonstrated interest in working shoulder-to-shoulder with AppSec engineers to raise platform security posture.
- Experience operating Couchbase (Couchbase Operator server groups XDCR) or another stateful data platform on Kubernetes.
- Experience migrating ingress controllers or other cluster-wide components with zero customer downtime.
- Experience with anomaly detection on platform telemetry.
#LIHybrid
About Us
Diligent is the AI leader in governance risk and compliance (GRC) SaaS solutions helping more than 1 million users and 700000 board members to clarify risk and elevate governance. The Diligent One Platform gives practitioners the C-Suite and the board a consolidated view of their entire GRC practice so they can more effectively manage risk build greater resilience and make better decisions faster.
Learn more or follow us onLinkedInandFacebook
What Diligent Offers You
- Creativity is ingrained in our culture. We are innovative collaborators by nature. We thrive in exploring how things can be differently both in our internal processes and to help our clients
- We care about our people.Diligent offers a flexible work environment global days of service comprehensive health benefits meeting free days generous time off policy and wellness programsto name a few
- We have teams all over the world. We may be headquartered in New York City but we have office hubs in Washington D.C. Vancouver London Galway Budapest Munich Bengaluru Singapore and Sydney.
- Diversity is important to us. Growing maintaining and promoting a diverse team is a top priority for us. We foster and encourage diversity through our Employee Resource Groups and provide access to resources and education to support the education of our team facilitate dialogue and foster understanding.
Diligent created the modern governance movement. Our world-changing idea is to empower leaders with the technology insights and connections they need to drive greater impact and accountability to lead with purpose. Our employees are passionate smart and creative people who not only want to help build the software company of the future but who want to make the world a more sustainable equitable and better place.
Headquartered in New York Diligent has offices in Washington D.C. London Galway Budapest Vancouver Bengaluru Munich Singapore and Sydney. To foster strong collaboration and connection this role will follow a hybrid work model. If you are within a commuting distance to one of our Diligent office locations you will be expected towork onsite at least 50% of the time.We believe that in-person engagement helps drive innovation teamwork and a strong sense of community.
We are a drug free workplace. Diligent is proud to be an equal opportunity employer. We do not discriminate based on race color religious creed sex national origin ancestry citizenship status pregnancy childbirth physical disability mental disability age military status protected veteran status marital status registered domestic partner or civil union status gender (including sex stereotyping and gender identity or expression) medical condition (including but not limited to cancer related or HIV/AIDS related) genetic information or sexual orientation in accordance with applicable federal state and local also consider qualified applicants regardless of criminal histories consistent with legal requirements. See alsoDiligents EEO Policy and Know Your are committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability you may contact us at .
To all recruitment agencies: Diligent does not accept unsolicited agency resumes. Please do not forward resumes to our jobs alias Diligent employees or any other organization location. Diligent is not responsible for any fees related to unsolicited resumes.
Required Experience:
Senior IC
About Company
Diligent, a modern governance company, is the only comprehensive governance software provider featuring tools to improve and simplify modern day governance.