Site Reliability Engineer (SRE) Data Centers

STRAGO

Not Interested
Bookmark
الإبلاغ عن هذه الوظيفة

profile موقع الوظيفة:

الرياض - السعودية

profile الراتب شهرياً: لم يكشف
تاريخ النشر: نُشرت قبل 3 ساعة
عدد الوظائف الشاغرة: 1 عدد الوظائف الشاغرة

ملخص الوظيفة

Site Reliability Engineer (SRE) - Data Centres

Location: Riyadh / NEOM Kingdom of Saudi Arabia (KSA)

Sector: Hyperscale Cloud Operations & AI Infrastructure

Role Type: Full-Time / Permanent

Role Objective

As the Kingdom moves toward becoming a global hub for AI and Cloud technology the stability of our physical and virtual infrastructure is paramount. We are looking for a Site Reliability Engineer (SRE) to apply a software engineering mindset to system administration. You will be the bridge between our massive physical hardware footprint and the automated software layers that power them ensuring our services are fast reliable and scalable.

Key Responsibilities

1. Infrastructure Automation & IaC

  • Replace manual operational tasks with automated workflows using Python Go or Bash.

  • Deploy and manage infrastructure using Terraform Ansible or Pulumi (Infrastructure as Code).

  • Maintain and scale Kubernetes (K8s) clusters across multiple availability zones within KSA.

2. Monitoring & Incident Management

  • Design and implement robust observability stacks using Prometheus Grafana and ELK.

  • Participate in a 24/7 on-call rotation to manage high-severity incidents conducting thorough Post-Mortem (RCA) reports to ensure issues do not recur.

  • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

3. Performance Tuning & Scaling

  • Optimize the interaction between high-performance AI workloads (GPUs) and the underlying Linux kernel/network stack.

  • Collaborate with hardware teams to ensure efficient thermal and power consumption during peak loads.

4. Security & Sovereignty

  • Implement security protocols in alignment with SDAIA (Saudi Data & AI Authority) and NCA (National Cybersecurity Authority) regulations.

  • Ensure data residency requirements are met within the Kingdoms borders.

Required Qualifications & Skills

  • Education: Bachelors degree in Computer Science Software Engineering or a related field.

  • Technical Stack: * Strong proficiency in Linux/Unix administration.

    • Experience with containerization (Docker/Kubernetes).

    • Hands-on experience with at least one major cloud provider (AWS GCP Azure or Oracle Cloud).

  • Experience: 3 years in an SRE DevOps or Systems Engineering role preferably in a high-scale environment.

  • Local Standing: Valid registration with the Saudi Council of Engineers (SCE).

Preferred Attributes

  • Experience with distributed storage systems (Ceph GlusterFS).

  • Knowledge of networking protocols (BGP OSPF) in a data center context.

  • Familiarity with the unique infrastructure challenges of NEOMs cognitive city framework.

Application Notice

STRAGO an equal employment opportunity employer is recruiting on behalf of our client. If your application matches the required profile you will be contacted to go ahead with the selection process.

Site Reliability Engineer (SRE) - Data CentresLocation: Riyadh / NEOM Kingdom of Saudi Arabia (KSA)Sector: Hyperscale Cloud Operations & AI InfrastructureRole Type: Full-Time / PermanentRole ObjectiveAs the Kingdom moves toward becoming a global hub for AI and Cloud technology the stability of our p...
اعرض المزيد view more