As a Site Reliability Engineer you will be responsible for providing the platform for mission critical cloud systems to maintain constant uptime scale seamlessly and allow for new applications and services to flourish. The successful candidate will be highly self-motivated with a passion for excellence quality and detail. The SRE will not only support operations but also work closely with the developers and architects within the team to aid in the design and assist with the implementation to improve stability security and scalability. AS AN SRE AT APPLE YOU WILL: - Operate monitor and triage all aspects of our production and non-production environments. - Design build and implement innovative solutions for previous present and future issues. - Prepare alert handling procedures runbooks and collaborate with the off-shore SRE teams. - Automate deployment and orchestration of services into the cloud environment as well as other routine processes. - Actively participate in capacity planning scale testing and disaster recovery exercises. - Interact with and support partner teams including engineering QA and program management. - Cultivate and maintain relationships with internal and external third-party vendors.
Bachelors Degree in Computer Science an engineering-related field or equivalent related experience. Advanced Degree preferred.
10 years in a Site Reliability Engineering Infrastructure focused role
Must be an expert and have in-depth professional experience with cloud operations with a focus on infrastructure-as-a-service (compute storage and network virtualization)
Highly proficient in Golang and Java
Experience operating large-scale multi-tenant Infrastructure as a Managed service
Ability to articulate complex technical concepts to both technical and non-technical stakeholders.
Able to troubleshoot issues across the entire infrastructure stack
Experience with Infrastructure as a Service orchestration tools (OpenStack CloudStack etc) is a plus
Experience with Linux system virtualization (Libvirt QEMU KVM etc) along with the APIs
Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk Grafana and Prometheus
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.