SRE DataPlatform
Job Summary
Being an SRE at VeepeeTech means being part of a transversal SRE community while integrating a product-oriented Data Platform team.
You will contribute to the reliability scalability and operability of critical data services by applying SRE and DevOps practices while sharing knowledge across teams.
The Data Platform is currently evolving toward a modern lakehouse architecture deployed on VeepeeCloud (our on-prem platform) based on technologies such as Trino Iceberg and object storage with strong ambitions around performance cost efficiency and platform ownership.
You will work in a distributed environment (France & Spain) within a team of 4050 data professionals across engineering analytics data science and governance.
You will play a key role in ensuring the reliability and scalability of this next-generation data platform while supporting the transition from public cloud to hybrid/on-prem architectures.
TASKS
Platform Reliability & Operations
Ensure reliability and performance of our data platform services (Trino Iceberg S3 Kafka Flink)
Define and implement SRE best practices: SLIs/SLOs error budgets observability
Build and maintain monitoring alerting and incident response frameworks (Prometheus Grafana etc.)
Cloud Migration & Architecture
Contribute to the migration from public datawarehouse cloud to VeepeeCloud lakehouse stack
Support coexistence between cloud and on-prem systems and ensure consistency and reliability
Help design resilient architectures for ingestion transformation and serving layers
Kubernetes & Infrastructure
Operate and improve services running on Kubernetes (GKE/EKS & on-prem clusters)
Automate infrastructure provisioning using Terraform Atlantis and/or Crossplane
Improve GitOps workflows for platform deployment and configuration
FinOps & Performance Optimization
Collaborate with teams to optimize compute/storage usage (Trino queries BigQuery slots etc.)
Build tools and dashboards to track cost usage and efficiency
Support the transition toward cost-efficient on-prem workloads
Developer Enablement
Improve self-service capabilities for data teams (e.g. provisioning Trino/Iceberg resources)
Help teams adopt best practices in reliability observability and deployment
Write clear technical documentation and runbooks
Resilience & DRP
Contribute to Disaster Recovery Plan (DRP) definition and implementation
Ensure multi-DC resilience (FR1 / NL1) and data replication strategies
Participate in incident management and postmortems
MUST HAVE skills
Strong experience with Kubernetes in production environments
Experience with distributed data systems (or strong willingness to learn)
Solid understanding of SRE principles (monitoring alerting SLAs/SLOs)
Experience with Infrastructure as Code (Terraform or similar)
Familiarity with GitOps workflows
Experience with observability tools (Prometheus Grafana logging systems)
Comfortable working in cloud environments
Strong collaboration mindset and ability to work across teams
Fluent in English
NICE TO HAVE skills
Experience with Trino Iceberg or data lakehouse architectures
Experience with Ceph S3 or object storage systems
Knowledge of Kafka / Flink / Airflow
Experience with FinOps practices and cost optimization
Experience with Crossplane or platform self-service models
Programming skills (Python Java or Go)
Experience with multi-region / multi-DC architectures
BENEFITS
Variable bonus;
The dynamic and creative environment within international teams;
The variety of self-education courses on our e-learning platform;
Participation in meetups and conferences locally and internationally;
Flexible Office with up to 3 days at home
RECRUITMENT PROCESS
1 30-minute HR Screen with a Veepeeᵀᵉᶜʰ Recruiter
2 General Technical exchange
3 Technical exchange with the manager
4 Team Interview
We are convinced that it is up to you to define the way you work to develop yourself and to progress.
At Veepee we guarantee that you can just be yourself!
For the service of diversity and inclusion Veepee is committed to reviewing all applications received on an equal basis.
COMPANYFor more information about our ecosystem : may use artificial intelligence (AI) tools to support parts of the hiring process such as reviewing applications analyzing resumes or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed please contact us.