We are looking for an experienced Site Reliability Engineer to ensure the stability scalability and operational excellence of a Kubernetes-based platform running in a hybrid environment.
The project is entering a pivotal phase with a major go-live planned for mid-February and a target audience of 75000 users. User onboarding is already underway with over 5000 users connected and expected to be active by year-end. While the system is stable we anticipate increased activity and new challenges in January February and after the go-livemaking this an exciting opportunity to make a real role focuses on performance optimization scaling strategies observability and reliability engineering.
Required Skills:
- 4 years of experience as SRE / DevOps Engineer
- Strong hands-on experience with Kubernetes in production
- Experience working with hybrid infrastructure (on-prem cloud)
- Solid knowledge of PostgreSQL performance tuning and scaling
- Experience with Qdrant or other vector databases
- Experience with Helm Kubernetes autoscaling and resource optimization
- Familiarity with observability stacks (Prometheus Grafana ELK/Loki)
- Understanding of performance engineering and load testing
- Experience with Linux systems and networking
- Strong troubleshooting and incident-management skills
Nice to Have:
- Experience with STACKIT or other sovereign clouds
- Experience with PgBouncer
- Knowledge of SRE practices (SLO/SLI)
- Experience in regulated or public-sector environments
- German language skills
Responsibilities:
- Operate and optimize hybrid infrastructure (on-prem & STACKIT)
- Manage and scale Kubernetes clusters
- Optimize Helm charts resource usage and autoscaling
- Conduct performance load and stress testing
- Ensure reliability availability and monitoring of production systems
- Tune and operate PostgreSQL
- Operate and optimize vector databases (e.g. Qdrant)
- Implement monitoring logging and alerting
- Support incident response and capacity planning
We offer*:
- Flexible working format - remote office-based or flexible
- A competitive salary and good compensation package
- Personalized career growth
- Professional development tools (mentorship program tech talks and trainings centers of excellence and more)
- Active tech communities with regular knowledge sharing
- Education reimbursement
- Memorable anniversary presents
- Corporate events and team buildings
- Other location-specific benefits
*not applicable for freelancers
Required Experience:
IC
We are looking for an experienced Site Reliability Engineer to ensure the stability scalability and operational excellence of a Kubernetes-based platform running in a hybrid environment.The project is entering a pivotal phase with a major go-live planned for mid-February and a target audience of 750...
We are looking for an experienced Site Reliability Engineer to ensure the stability scalability and operational excellence of a Kubernetes-based platform running in a hybrid environment.
The project is entering a pivotal phase with a major go-live planned for mid-February and a target audience of 75000 users. User onboarding is already underway with over 5000 users connected and expected to be active by year-end. While the system is stable we anticipate increased activity and new challenges in January February and after the go-livemaking this an exciting opportunity to make a real role focuses on performance optimization scaling strategies observability and reliability engineering.
Required Skills:
- 4 years of experience as SRE / DevOps Engineer
- Strong hands-on experience with Kubernetes in production
- Experience working with hybrid infrastructure (on-prem cloud)
- Solid knowledge of PostgreSQL performance tuning and scaling
- Experience with Qdrant or other vector databases
- Experience with Helm Kubernetes autoscaling and resource optimization
- Familiarity with observability stacks (Prometheus Grafana ELK/Loki)
- Understanding of performance engineering and load testing
- Experience with Linux systems and networking
- Strong troubleshooting and incident-management skills
Nice to Have:
- Experience with STACKIT or other sovereign clouds
- Experience with PgBouncer
- Knowledge of SRE practices (SLO/SLI)
- Experience in regulated or public-sector environments
- German language skills
Responsibilities:
- Operate and optimize hybrid infrastructure (on-prem & STACKIT)
- Manage and scale Kubernetes clusters
- Optimize Helm charts resource usage and autoscaling
- Conduct performance load and stress testing
- Ensure reliability availability and monitoring of production systems
- Tune and operate PostgreSQL
- Operate and optimize vector databases (e.g. Qdrant)
- Implement monitoring logging and alerting
- Support incident response and capacity planning
We offer*:
- Flexible working format - remote office-based or flexible
- A competitive salary and good compensation package
- Personalized career growth
- Professional development tools (mentorship program tech talks and trainings centers of excellence and more)
- Active tech communities with regular knowledge sharing
- Education reimbursement
- Memorable anniversary presents
- Corporate events and team buildings
- Other location-specific benefits
*not applicable for freelancers
Required Experience:
IC
View more
View less