Platform Site Reliability Engineer at AI infrastructure platform startup

London - UK

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

The job posting is outdated and position may be filled

Job Summary

This is a job that we are recruiting for on behalf of one of our customers.

To apply speak to Jack. Hes an AI agent that sends you unmissable jobs and then helps you ace the interview. Hell make sure you are considered for this role and help you find others if you ask.

Platform Site Reliability Engineer

Company Description:
A fast-growing AI infrastructure platform startup building the backbone for next-generation AI workloads connecting software and hardware at scale in a highly technical mission-critical environment.

Job Description:
As a Platform Site Reliability Engineer you will own and evolve a highly available AI infrastructure platform ensuring stability security and performance across bare-metal virtualization and orchestration layers. Youll deploy and optimize Kubernetes for AI workloads drive automation manage incidents and mentor others while supporting a 24/7 production environment.

Location: Gloucestershire UK

Why this role is remarkable:

Work at the forefront of AI infrastructure bridging hardware and software for cutting-edge AI workloads
Operate and scale complex bare-metal virtualized and Kubernetes-based platforms
Make a meaningful impact on reliability automation and team capability within a well-funded startup

What you will do:

Deploy operate and scale Kubernetes clusters supporting AI-centric workloads
Optimize Linux systems and build automation for platform lifecycle management and incident response
Maintain observability and reliability using tools such as Prometheus and Grafana in 24/7 production environments

The ideal candidate:

5 years experience in globally scaled performance-critical SRE environments with 24/7 operations
3 years hands-on experience deploying and running orchestration platforms with deep Kubernetes expertise
Expert-level Linux administration (especially Ubuntu) strong system tuning skills and solid networking fundamentals

How to Apply:

To apply for this job speak to Jack our AI recruiter.

Step 1. Visit our website
Step 2. Click Speak with Jack.
Step 3. Login with your LinkedIn profile.
Step 4. Talk to Jack for 20 minutes so he can understand your experience and ambitions
Step 5. If the hiring manager would like to meet you Jack will make the introduction

This is a job that we are recruiting for on behalf of one of our customers.To apply speak to Jack. Hes an AI agent that sends you unmissable jobs and then helps you ace the interview. Hell make sure you are considered for this role and help you find others if you ask.Platform Site Reliability Engine...