We are looking for seasoned software and systems engineers to join the Block Storage SRE team at Apple! The role involves tremendous amount of individual responsibility and influence over the direction the platform shaping its use by many critical Apple Cloud services for years to come. You are someone with ideas and real passion for software delivered as a service to improve reuse efficiency and simplicity. This engineers work will impact hundreds of millions of users and be essential to the success of some of the most visible current and future Apple features. At Apple Cloud we run a mix of open source vendor licensed and internally developed tools to perform functions such as system configuration management provisioning software development & deployment logging and monitoring. Youll learn these tools and have opportunities to improve them. We believe critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are this sound like you We want you to join our team!
Minimum of 5 years of experience in a Site Reliability Engineer or Infrastructure Software Development role.
Experience in building operating and scaling distributed storage systems in a private public or hybrid cloud environment.
The ability to design author review and release code in one or more high level language (e.g. Go (preferred) Rust Python and/or Java etc.).
Good understanding of block object and file storage solutions in Linux (such as LVM XFS ext4 S3 Ceph Gluster NFS).
Familiarity with microservices architecture and container orchestration with Kubernetes.
Understanding of Linux internals standard networking protocols and distributed systems.
Experience with provisioning data migration backup & recovery at-scale testing disaster recovery and capacity planning.
Technical (Engineering or Computer Science) BS/MS degree or equivalent work experience
Acute drive to automate manual operations and to improve them with well defined and tested APIs.
Awareness of protocols for deployment of storage systems - implication of physical and virtual deployment models to change management failure domains hardware lifecycle management etc.
Experience with deploying supporting and monitoring new and existing services platforms and application stacks.
Experienced in SRE principles such as monitoring alerting error budgets fault analysis and other common concepts in reliability engineering. Skilled at seeing opportunities to reduce manual work through enhancements in code and processes
Familiarity with relational & non-relational databases (such as Cassandra Postgres & RocksDB).
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.