While stateless applications are easily replaced StatefulSets are the bedrock of our data integrity and service uptime. We are seeking a specialist to architect and manage the entire lifecycle of stateful workloads within our Azure-based MSF (Microservices Framework).
Your mission is to ensure that databases message brokers and persistent storage layers are architected for 99.99% availability. You will move us away from snowflake configurations toward a fully automated self-healing stateful infrastructure where manual intervention is a relic of the past.
Working hours: 15:00-23:00 CET.
Responsibilities:
- Lifecycle Orchestration: Automate the end-to-end lifecycle of StatefulSets: provisioning seamless volume expansion graceful termination and automated re-attachment during node failures.
- High Availability & Uptime: Implement advanced scheduling logic (Pod Topology Spread Constraints Anti-affinity) to ensure stateful workloads survive zonal outages and maintenance windows.
- Storage Performance & Tuning: Optimize Azure Disk (Premium/Ultra) and Azure NetApp Files integration via CSI drivers to minimize IOPS bottlenecks and latency.
- Disaster Recovery Automation: Develop and test automated Snapshot-to-Restore pipelines. Ensure that the Actual State of data volumes can be recovered to the Goal State in minutes not hours.
- Infrastructure as Code: Utilize Terraform to provision the hardened Azure foundation (Disk Encryption Sets Proximity Placement Groups and Networking) required for high-performance stateful clusters.
Basic Qualifications:
- Kubernetes Internal Mastery: Expert-level understanding of StatefulSet controllers Persistent Volume Claims (PVCs) and the Container Storage Interface (CSI).
- Azure AKS Specialist: Deep experience with Azure Kubernetes Service specifically around persistent storage integration and Azure-specific networking constraints.
- Automation & Scripting: Proficient in Go or Python/Bash for writing custom controllers or maintenance hooks (PreStop/PostStart) that ensure data consistency during updates.
- Reliability Engineering: Proven track record of managing production databases or distributed systems (e.g. Postgres ClickHouse Elasticsearch) on Kubernetes.
We offer*:
- Flexible working format - remote office-based or flexible
- A competitive salary and good compensation package
- Personalized career growth
- Professional development tools (mentorship program tech talks and trainings centers of excellence and more)
- Active tech communities with regular knowledge sharing
- Education reimbursement
- Memorable anniversary presents
- Corporate events and team buildings
- Other location-specific benefits
*not applicable for freelancers
Required Experience:
IC
While stateless applications are easily replaced StatefulSets are the bedrock of our data integrity and service uptime. We are seeking a specialist to architect and manage the entire lifecycle of stateful workloads within our Azure-based MSF (Microservices Framework).Your mission is to ensure that d...
While stateless applications are easily replaced StatefulSets are the bedrock of our data integrity and service uptime. We are seeking a specialist to architect and manage the entire lifecycle of stateful workloads within our Azure-based MSF (Microservices Framework).
Your mission is to ensure that databases message brokers and persistent storage layers are architected for 99.99% availability. You will move us away from snowflake configurations toward a fully automated self-healing stateful infrastructure where manual intervention is a relic of the past.
Working hours: 15:00-23:00 CET.
Responsibilities:
- Lifecycle Orchestration: Automate the end-to-end lifecycle of StatefulSets: provisioning seamless volume expansion graceful termination and automated re-attachment during node failures.
- High Availability & Uptime: Implement advanced scheduling logic (Pod Topology Spread Constraints Anti-affinity) to ensure stateful workloads survive zonal outages and maintenance windows.
- Storage Performance & Tuning: Optimize Azure Disk (Premium/Ultra) and Azure NetApp Files integration via CSI drivers to minimize IOPS bottlenecks and latency.
- Disaster Recovery Automation: Develop and test automated Snapshot-to-Restore pipelines. Ensure that the Actual State of data volumes can be recovered to the Goal State in minutes not hours.
- Infrastructure as Code: Utilize Terraform to provision the hardened Azure foundation (Disk Encryption Sets Proximity Placement Groups and Networking) required for high-performance stateful clusters.
Basic Qualifications:
- Kubernetes Internal Mastery: Expert-level understanding of StatefulSet controllers Persistent Volume Claims (PVCs) and the Container Storage Interface (CSI).
- Azure AKS Specialist: Deep experience with Azure Kubernetes Service specifically around persistent storage integration and Azure-specific networking constraints.
- Automation & Scripting: Proficient in Go or Python/Bash for writing custom controllers or maintenance hooks (PreStop/PostStart) that ensure data consistency during updates.
- Reliability Engineering: Proven track record of managing production databases or distributed systems (e.g. Postgres ClickHouse Elasticsearch) on Kubernetes.
We offer*:
- Flexible working format - remote office-based or flexible
- A competitive salary and good compensation package
- Personalized career growth
- Professional development tools (mentorship program tech talks and trainings centers of excellence and more)
- Active tech communities with regular knowledge sharing
- Education reimbursement
- Memorable anniversary presents
- Corporate events and team buildings
- Other location-specific benefits
*not applicable for freelancers
Required Experience:
IC
View more
View less