System Reliability Engineer II

Beyond ONE

Not Interested
Bookmark
الإبلاغ عن هذه الوظيفة

profile موقع الوظيفة:

الرياض - السعودية

profile الراتب شهرياً: لم يكشف
تاريخ النشر: نُشرت منذ أكثر من 30 يومًا
عدد الوظائف الشاغرة: 1 عدد الوظائف الشاغرة

ملخص الوظيفة

We dont think about job roles in a traditional way. We are anti-silo. Anti-career stagnation. Anti-conventional.

Beyond ONE is a digital services provider radically reshaping the personalised digital ecosystems of consumers in high growth markets around the world. Were building a digital services aggregator platform with a strong telco foundation and a profitable growth strategy that empowers users to drive their own experiencesubscribe once source from many and only pay for what you actually use.

Since being founded in 2021 weve acquired Virgin Mobile MEA Friendi Mobile MEA and Virgin Mobile LATAM (with 6.5 million subscribers) and 1600 dedicated colleagues across Chile Colombia KSA Kuwait Mexico Oman and UAE.

To disrupt for good takes a rebellious spirit a questioning mind and a warm heart. We really care about how to get things done and not who manages who. We benefit from our diversity and together we disrupt the way we and others thinkin about our lives for good.

Do you want to exchange ideas learn from each other and leave your mark on our journey This is the place for you.

Role Purpose
Why this role matters: As a Site Reliability Engineer (SRE) you will play a key role in enhancing system reliability scalability and performance through automation monitoring and operational excellence. Your contributions will help shape our reliability engineering practices and platform stability ultimately transforming how we deliver resilient and scalable services to users.

What success looks like: In your first year you will:

  • Build and maintain automated systems to improve service uptime and incident response.
  • Implement and refine monitoring and alerting strategies to proactively detect issues.
  • Drive operational efficiencies by reducing toil and introducing reliability-focused tooling.

Why this is for you: If youre keen on solving availability latency and performance issues at scale hit us up. Were looking for someone ready to tackle this challenge head-on and make an impact from day one.

Key Responsibilities
In this role you will:

  • Lead the development of resilient highly available systems and incident response strategies.
  • Collaborate with software and infrastructure teams driving reliability and observability initiatives.
  • Manage production infrastructure and environments ensuring optimal performance and uptime.
  • Automate operational tasks using infrastructure-as-code and scripting tools.
  • Design and maintain monitoring and alerting systems using Prometheus Grafana or similar.
  • Conduct blameless postmortems and implement learnings to prevent future incidents.
  • Implement SLOs SLIs and error budgets to guide engineering decisions.
  • Optimize CI/CD pipelines and deployment processes for reliability and speed.
  • Engage with stakeholders to align reliability goals with business outcomes.

Qualifications & Attributes
Were seeking someone who embodies the following:

Education: Bachelors degree in Computer Science Engineering or a related field.
Experience: 3 years in Site Reliability Engineering DevOps or similar operational roles.

Technical Skills:
Must-haves:

  • Strong background in Linux/Unix systems and network administration.
  • Experience with cloud platforms (AWS Azure or GCP).
  • Experience implementing SLOs SLIs and error budget policies.
  • Proficiency in infrastructure automation (Terraform Ansible) and scripting (Python Go or Bash).
  • Deep understanding of monitoring observability and incident management tools (Prometheus Grafana Splunk etc.).
  • Solid grasp of CI/CD practices containerization (Docker) and orchestration (Kubernetes).

Nice-to-haves:

  • Familiarity with distributed systems service meshes and performance tuning.

Unique Attributes:

  • Thrives in fast-paced environments requiring quick decision-making.
  • Possesses a proactive mindset and a calm analytical approach to troubleshooting under pressure.
  • Excels with SRE best practices modern ops philosophies and large-scale system thinking.

What we offer:

  • Rapid learning opportunities - we enable learning through flexible career paths exposure to challenging & meaningful work that will help build and strengthen your expertise.
  • Hybrid work environment - flexibility to work from home 2 days a week.
  • Healthcare and other local benefits offered in market.

By submitting your application you acknowledge and consent to the use of Greenhouse & BrightHire during the recruitment process. This may include the storage and processing of your data on servers located outside your country of residence. For further information please contact us at

We dont think about job roles in a traditional way. We are anti-silo. Anti-career stagnation. Anti-conventional.Beyond ONE is a digital services provider radically reshaping the personalised digital ecosystems of consumers in high growth markets around the world. Were building a digital services agg...
اعرض المزيد view more

المهارات المطلوبة

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • إدارة الصيانة بالحاسب الآلي
  • الصيانة
  • مهندس ميكانيكي
  • التصنيع
  • استكشاف الأخطاء وإصلاحها