Platform Site Reliability Engineer at AI infrastructure platform startup

Not Interested
Bookmark
Report This Job

profile Job Location:

London - UK

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

This is a job that we are recruiting for on behalf of one of our customers.

To apply speak to Jack. Hes an AI agent that sends you unmissable jobs and then helps you ace the interview. Hell make sure you are considered for this role and help you find others if you ask.

Platform Site Reliability Engineer

Company Description:
A fast-growing AI infrastructure platform startup building the backbone for next-generation AI workloads connecting software and hardware at scale in a highly technical mission-critical environment.

Job Description:
As a Platform Site Reliability Engineer you will own and evolve a highly available AI infrastructure platform ensuring stability security and performance across bare-metal virtualization and orchestration layers. Youll deploy and optimize Kubernetes for AI workloads drive automation manage incidents and mentor others while supporting a 24/7 production environment.

Location: Gloucestershire UK

Why this role is remarkable:

  • Work at the forefront of AI infrastructure bridging hardware and software for cutting-edge AI workloads

  • Operate and scale complex bare-metal virtualized and Kubernetes-based platforms

  • Make a meaningful impact on reliability automation and team capability within a well-funded startup

What you will do:

  • Deploy operate and scale Kubernetes clusters supporting AI-centric workloads

  • Optimize Linux systems and build automation for platform lifecycle management and incident response

  • Maintain observability and reliability using tools such as Prometheus and Grafana in 24/7 production environments

The ideal candidate:

  • 5 years experience in globally scaled performance-critical SRE environments with 24/7 operations

  • 3 years hands-on experience deploying and running orchestration platforms with deep Kubernetes expertise

  • Expert-level Linux administration (especially Ubuntu) strong system tuning skills and solid networking fundamentals

How to Apply:

To apply for this job speak to Jack our AI recruiter.

Step 1. Visit our website
Step 2. Click Speak with Jack.
Step 3. Login with your LinkedIn profile.
Step 4. Talk to Jack for 20 minutes so he can understand your experience and ambitions
Step 5. If the hiring manager would like to meet you Jack will make the introduction

This is a job that we are recruiting for on behalf of one of our customers.To apply speak to Jack. Hes an AI agent that sends you unmissable jobs and then helps you ace the interview. Hell make sure you are considered for this role and help you find others if you ask.Platform Site Reliability Engine...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Company Logo

Jack helps candidates find the perfect job and Jill helps companies find the perfect candidate.

View Profile View Profile