Site Reliability Engineer (SRE) Data Centers

STRAGO

موقع الوظيفة:

الرياض - السعودية

الراتب شهرياً: لم يكشف

تاريخ النشر: نُشرت منذ أكثر من 30 يومًا

عدد الوظائف الشاغرة: 1 عدد الوظائف الشاغرة

سجل للتقديم

ملخص الوظيفة

Site Reliability Engineer (SRE) - Data Centres

Location: Riyadh / NEOM Kingdom of Saudi Arabia (KSA)

Sector: Hyperscale Cloud Operations & AI Infrastructure

Role Type: Full-Time / Permanent

Role Objective

As the Kingdom moves toward becoming a global hub for AI and Cloud technology the stability of our physical and virtual infrastructure is paramount. We are looking for a Site Reliability Engineer (SRE) to apply a software engineering mindset to system administration. You will be the bridge between our massive physical hardware footprint and the automated software layers that power them ensuring our services are fast reliable and scalable.

Key Responsibilities

1. Infrastructure Automation & IaC

Replace manual operational tasks with automated workflows using Python Go or Bash.
Deploy and manage infrastructure using Terraform Ansible or Pulumi (Infrastructure as Code).
Maintain and scale Kubernetes (K8s) clusters across multiple availability zones within KSA.

2. Monitoring & Incident Management

Design and implement robust observability stacks using Prometheus Grafana and ELK.
Participate in a 24/7 on-call rotation to manage high-severity incidents conducting thorough Post-Mortem (RCA) reports to ensure issues do not recur.
Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

3. Performance Tuning & Scaling

Optimize the interaction between high-performance AI workloads (GPUs) and the underlying Linux kernel/network stack.
Collaborate with hardware teams to ensure efficient thermal and power consumption during peak loads.

4. Security & Sovereignty

Implement security protocols in alignment with SDAIA (Saudi Data & AI Authority) and NCA (National Cybersecurity Authority) regulations.
Ensure data residency requirements are met within the Kingdoms borders.

Required Qualifications & Skills

Education: Bachelors degree in Computer Science Software Engineering or a related field.
Technical Stack: * Strong proficiency in Linux/Unix administration.
- Experience with containerization (Docker/Kubernetes).
- Hands-on experience with at least one major cloud provider (AWS GCP Azure or Oracle Cloud).
Experience: 3 years in an SRE DevOps or Systems Engineering role preferably in a high-scale environment.
Local Standing: Valid registration with the Saudi Council of Engineers (SCE).

Preferred Attributes

Experience with distributed storage systems (Ceph GlusterFS).
Knowledge of networking protocols (BGP OSPF) in a data center context.
Familiarity with the unique infrastructure challenges of NEOMs cognitive city framework.

Application Notice

STRAGO an equal employment opportunity employer is recruiting on behalf of our client. If your application matches the required profile you will be contacted to go ahead with the selection process.

Site Reliability Engineer (SRE) - Data CentresLocation: Riyadh / NEOM Kingdom of Saudi Arabia (KSA)Sector: Hyperscale Cloud Operations & AI InfrastructureRole Type: Full-Time / PermanentRole ObjectiveAs the Kingdom moves toward becoming a global hub for AI and Cloud technology the stability of our p...

Site Reliability Engineer (SRE) - Data Centres

Location: Riyadh / NEOM Kingdom of Saudi Arabia (KSA)

Sector: Hyperscale Cloud Operations & AI Infrastructure

Role Type: Full-Time / Permanent

Role Objective

Key Responsibilities

1. Infrastructure Automation & IaC

Replace manual operational tasks with automated workflows using Python Go or Bash.
Deploy and manage infrastructure using Terraform Ansible or Pulumi (Infrastructure as Code).
Maintain and scale Kubernetes (K8s) clusters across multiple availability zones within KSA.

2. Monitoring & Incident Management

Design and implement robust observability stacks using Prometheus Grafana and ELK.
Participate in a 24/7 on-call rotation to manage high-severity incidents conducting thorough Post-Mortem (RCA) reports to ensure issues do not recur.
Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

3. Performance Tuning & Scaling

Optimize the interaction between high-performance AI workloads (GPUs) and the underlying Linux kernel/network stack.
Collaborate with hardware teams to ensure efficient thermal and power consumption during peak loads.

4. Security & Sovereignty

Implement security protocols in alignment with SDAIA (Saudi Data & AI Authority) and NCA (National Cybersecurity Authority) regulations.
Ensure data residency requirements are met within the Kingdoms borders.

Required Qualifications & Skills

Education: Bachelors degree in Computer Science Software Engineering or a related field.
Technical Stack: * Strong proficiency in Linux/Unix administration.
- Experience with containerization (Docker/Kubernetes).
- Hands-on experience with at least one major cloud provider (AWS GCP Azure or Oracle Cloud).
Experience: 3 years in an SRE DevOps or Systems Engineering role preferably in a high-scale environment.
Local Standing: Valid registration with the Saudi Council of Engineers (SCE).

Preferred Attributes

Experience with distributed storage systems (Ceph GlusterFS).
Knowledge of networking protocols (BGP OSPF) in a data center context.
Familiarity with the unique infrastructure challenges of NEOMs cognitive city framework.

Application Notice

قدم الآن

عن الشركة

STRAGO

عرض صفحة الشركة عرض صفحة الشركة

التقديم التلقائي على الوظائف بـ AI

قدّم على عشرات الوظائف بنقرة واحدة

منشئ السيرة الذاتية بـ AI

سيرة ذاتية ATS جاهزة خلال 5 دقائق

إنشاء خطاب التقديم بـ AI

اكتب خطابًا شخصيًا مقنعًا بالذكاء الاصطناعي

Site Reliability Engineer (SRE) Data Centers

الرياض - السعودية

ملخص الوظيفة

Site Reliability Engineer (SRE) - Data Centres

Role Objective

Key Responsibilities

1. Infrastructure Automation & IaC

2. Monitoring & Incident Management

3. Performance Tuning & Scaling

4. Security & Sovereignty

Required Qualifications & Skills

Preferred Attributes

Application Notice

Site Reliability Engineer (SRE) - Data Centres

Role Objective

Key Responsibilities

1. Infrastructure Automation & IaC

2. Monitoring & Incident Management

3. Performance Tuning & Scaling

4. Security & Sovereignty

Required Qualifications & Skills

Preferred Attributes

Application Notice

عن الشركة

وظائف ذات صلة