drjobs Observability Site Reliability Engineer

Observability Site Reliability Engineer

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

London - UK

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

Apple Services Engineering infrastructure is BIG. Operating at our scale across multiple geographically dispersed data centers and servicing hundreds of millions of users presents unique challenges. As an SRE at Apple youll need to solve these problems using data teamwork and your own expertise. SREs at Apple own the full infrastructure stack; from device driver performance debugging to content delivery network traffic management our responsibilities are both broad and runs the majority of its systems on Linux. We run a mix of open source vendor licensed and internally developed tools to perform functions such as system configuration management provisioning software deployment logging and monitoring. Youll learn these tools and have opportunities to improve them. Our team is collaborative; we work closely with the development teams we support to deliver the best results for Apple. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.


  • Strong understanding of the Linux operating system and TCP/IP suite of networking protocols
  • Ability to design author and release code in languages like Go or Python
  • Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet Chef Ansible)
  • Familiarity with microservices architecture and container orchestration with Kubernetes


  • Bare metal management experience and experience with deploying supporting and monitoring new and existing services platforms and application stacks
  • Acute drive to automate manual operations and to improve them through repeated iteration
  • Experience with scale testing disaster recovery and capacity planning and experienced in managing and scaling distributed systems in a public private or hybrid cloud environment
  • Experience with the Prometheus ecosystem and a good understanding of infrastructure observability principles

Employment Type

Full Time

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.