Our client is a fast-growing Dutch start-up that develops innovative AI solutions for healthcare. They develop proprietary AI models / LLMs for this which run on their own in-house servers. To manage and optimize this (no-cloud) environment they are looking for a Site Reliability Engineer.
This position does not offer work permit /visa sponsorship and is therefore only open to candidates with EU/EEA country citizenship or otherwise not in need of sponsorship. Thanks for your understanding.
The role
As a Site Reliability Engineer you are responsible for setting up and managing the on-premise infrastructure with the latest hardware including NVIDIA GPUs. You work together with a small team that develops the software and AI models internally and ensures that the systems are optimized for performance and security. Focus is primarily on infrastructure/server management (site reliability) but ideally you can also support the build pipelines. DevOps tasks will be limited and its a no-cloud environment. This means:
- Managing on-premise servers
- Configuring and maintaining the Kubernetes cluster according to a GitOps approach
- Ensuring that all resources are available reliably and securely
- Ensuring optimal stability uptime and performance
What do we need
Key knowledge and experience:
- At least 3 years of relevant work experience
- Experience with Kubernetes and Docker
- Knowledge of on-premise Linux server management networking and storage
- Proficient with Bash and Python
- You prefer to work on solutions with positive societal impact
- You are looking for a close-knit and social team in a start-up/scale-up environment
- You are pragmatic hands-on and do not get stuck in over-analyzing problems
- You can work independently and like to solve things yourself but are also not afraid to ask your colleagues for help
- Fluent in English and not afraid to work in a company with mainly Dutch colleagues
- You will not need work permit/visa sponsorship and you preferably live in The Netherlands or will move here on short notice (not dependent on job offer).
Nice to have:
- Experience with monitoring tools (Prometheus Grafana)
- Experience with managing GPU resources
- Experience with GitOps (FluxCD)
What can you expect
- A good salary and travel allowance
- 31 vacation days
- Option to work from home up to 2 days a week
- Possibility to grow into a lead role when the company grows
- You will become part of a company that develops its own state-of-the-art AI models and has access to the latest hardware.
- The opportunity to make a meaningful contribution to improving healthcare
- Wine tastings on Friday afternoons and fun team outings