Site Reliability Engineer (Edge Services), Infrastructure Services

Apple


Job Location:

Austin, TX - USA

Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

We are seeking a proactive Site Reliability Engineer to champion the evolution of our production this role you will help drive the vision for our visibility moving beyond simple uptime metrics to build a sophisticated data-driven reliability framework. You will play a pivotal role in ensuring our services are resilient scalable and observable bridging the gap between complex distributed systems and seamless user experiences.

As a key member of the SRE team your mission is to treat operations as a software problem. You will focus on designing and implementing a next-generation observability and alerting strategy that prioritizes high-cardinality data and meaningful signals over noise. You will spend your time building self-healing systems reducing toil through aggressive automation and partnering with development teams to bake reliability into the CI/CD pipeline. Your goal is to move us toward a proactive stance where performance bottlenecks are identified and mitigated before they impact the customer.

B.S. in Computer Science Computer Engineering or a related technical field or equivalent professional work of Linux internals and deep networking expertise including HTTP/2 HTTP/3 (QUIC) and HTTPS/TLS. You should be comfortable debugging protocol-level issues and optimizing traffic ability to automate repetitive tasks and complex workflows using Python or GonExperience configuring and managing modern monitoring suites (e.g. Prometheus Grafana ClickHouse) with a focus on creating actionable high-signal quality of Data Structures and Algorithms (DSA) to write efficient performant code and troubleshoot complex system knowledge of SLIs SLOs Error Budgets Release Management and Incident Management to drive engineering priorities.

Experience managing cloud environments (AWS GCP or Azure) using Terraform Ansible or : Hands-on experience scaling and securing containerized workloads via track record of leading blameless post-mortems and using those insights to harden the system against future to consult with product teams on service design to improve long-term proactive engineering mindset focused on shifting from fixing things when they break to designing things so they dont break (or so they fail gracefully).nPractical fluency in applying Generative AI tools within SRE and software engineering workflows from accelerating observability query construction and alert design to building AI-assisted debugging and triage capabilities that encode institutional knowledge into repeatable context-aware workflows with the engineering rigour to validate own and iterate on AI-assisted outputs in production-adjacent contexts

Required Experience:

IC

We are seeking a proactive Site Reliability Engineer to champion the evolution of our production this role you will help drive the vision for our visibility moving beyond simple uptime metrics to build a sophisticated data-driven reliability framework. You will play a pivotal role in ensuring our s...

About Company

Company Logo

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile