Evaluation Reliability SRE

Apple


Job Location:

Cupertino, CA - USA

Monthly Salary: Not Disclosed
Posted on: 14 hours ago
Vacancies: 1 Vacancy

Job Summary

Join the team redefining what a deeply personal and integrated assistant can be. nnAs part of the Siri organization you will help shape one of the worlds most widely used AI assistants powered by our next-generation of Apple Intelligence with capabilities like personal context understanding and on-screen awareness built with privacy from the ground up. Your work will have direct meaningful impact for users across iOS iPadOS macOS watchOS and is a rare opportunity to build at the intersection of cutting-edge AI and human-centered design shipping technology that is centered around users and their needs.n

Siris quality signal drives every model and product decision before a release ships. But a signal is only as trustworthy as the infrastructure behind Evaluation Reliability Engineering (ERE) team exists to make that infrastructure bulletproof. Within ERE Core SRE owns the production backbone: resource management session orchestration on-call response and the observability systems that surface failures before they corrupt evaluation signal. We sit at the intersection of distributed systems ML evaluation infrastructure and operational is a senior hands-on role. You share primary on-call as part of a global follow-the-sun rotation lead incident investigations end-to-end and set the operational bar the rest of the team works against. You are fluent with agentic coding tools like Claude Code Cursor or Copilot and use them as a force multiplier across runbook authoring automation and log analysis.n

Own reliability outcomes across the evaluation infrastructure stack: orchestration capacity and service healthnOwn runbook quality across the team: author runbooks for complex failure categories and set the bar that guides other engineers to produce the same qualitynBuild deep expertise in the device orchestration and provisioning layers understand quota management retry behavior and failure modes well enough to diagnose upstream issues independentlynInstrument infrastructure components that lack observability; if a failure is hard to detect make it easy to detect before the next occurrencenBalance incident response with proactive reliability work automation and eliminating recurring failures are core deliverablesnPartner on SLO definition and burn-rate alerting; bring the operational depth that turns reliability targets from aspirational to measurablenInfluence the teams technical roadmap mentor junior SREs and represent infrastructure reliability posture to leadership and in cross-team reviews

5 years of site reliability infrastructure or platform engineering experience with direct on-call ownership in production systemsnHands-on orchestration experience (Kubernetes or equivalent): cluster health resource management scheduling and failure diagnosis at scalen

Experience owning or closely operating a device or VM provisioning pipeline; familiarity with virtualization-layer failure modes is a strong plusnTrack record of improving system reliability against measurable outcomes uptime MTTR incident frequency not just responding to incidents but eliminating their causesnIncident command discipline: able to lead a multi-team incident from declaration to close-outnDepth in at least one of: distributed systems reliability device management infrastructure evaluation or ML platform operationsnDemonstrated cross-team technical influence; prior experience shaping reliability practices beyond the immediate team
Join the team redefining what a deeply personal and integrated assistant can be. nnAs part of the Siri organization you will help shape one of the worlds most widely used AI assistants powered by our next-generation of Apple Intelligence with capabilities like personal context understanding and on-s...

About Company

Company Logo

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile