Engage with our product teams to understand requirements design and implement resilient and scalable infrastructure monitor and triage all aspects of our production and non-production environments. Collaborate with other engineers on code infrastructure design reviews and process and integrate new technologies to improve system reliability security and and implement automation to provision configure deploy and monitor Apple services. Participate in an on-call rotation providing hands-on technical expertise during service-impacting to capacity planning scale testing and disaster recovery operational problems with a software engineering mindset.
6 years of demonstrated expertise in Site Reliability Engineering Infrastructure Ops or DevOps-focused role.
Understanding of SRE principles includes monitoring alerting error budgets fault analysis capacity planning automation and toil reduction.
Proficiency in at least one programming language - python go or Java.
Experience managing and scaling distributed systems in a public private or hybrid cloud environment.
Experience with microservices architecture and container orchestration using Kubernetes or similar technologies.
BS or MS in Computer Science / related fields or equivalent work experience.
Experience running Tier 1 services for 24/7 support.
Strong understanding of Linux operating system fundamentals networking principles and system management.
Strong sense of ownership with a desire to communicate and collaborate with other engineers and teams.
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.