Are you a senior engineer who can keep large AI-augmented systems runningnreliably at Apple scale Apples Stability Engineering team is looking for anseasoned engineer to join our Core team. We build and operate thenplatforms services and infrastructure that turn crash reports from Applendevices into actionable engineering insights. Youll work on systems wherenLLMs and agents are already part of the production fabric evolving themnhardening them and using AI tools to extend what a small team can deliver.
Our team owns the end-to-end platform behind stability analysis at Apple:nsymbolication of crash logs across the companys hardware portfolio the datanpipelines that aggregate and cluster crash logs and the applications andnservices that engineers across Apple use every day to drive operating-systemnquality. This role is about keeping that platform healthy extending itndeliberately and making the engineering team itself more effective by usingnAI tools to day youll spend most of your time on the engineering work of runningnreal systems: tuning evaluation infrastructure tightening operationalncontrols improving auditability and debug trails and scaling the workflowsnour analysts rely on. When new capabilities are needed youll prototype andnintegrate them into the platform. Youll partner closely with stabilitynanalysts who are domain experts in OS reliability and with the broader teamnresponsible for symbolication ETL and service infrastructure. Youll alsonbe expected to use AI-assisted development tools fluently to investigatenissues refactor at scale and ship more with a small looking for someone with the rigor of a seasoned production engineernwho is also comfortable operating systems that include LLMs and agents asnfirst-class components. If you enjoy taking responsibility for a complexnalready-running platform and making it steadily better we want to talk.
5 years of professional software engineering experience building and operating production systemsnBS in Computer Science or a related field or equivalent practical experiencenFluent use of AI-assisted development tools (coding agents code review assistants etc.) to work effectively at scalenDemonstrated experience designing and scaling distributed systems (load balancing active-active topologies capacity planning throughput-bound services)nTrack record of maintaining and evolving production services observability operational controls incident response and steady iteration on existing systemsnStrong full-stack instincts; comfortable spanning data infrastructure backend services and the user-facing surfaces that consume themnProven ability to operate independently on ambiguous open-ended problems where the right answer is not obvious
Experience operating LLM- or agent-based features in production environments over timenExperience building or maintaining evaluation harnesses audit trails or replay infrastructure for AI systemsnBackground in developer tools observability crash/stability analysis or other operating-system-quality domainsnFamiliarity with one or more of: Ruby on Rails Python for production servicesnExperience working in environments with significant deferred scalability work (capacity-constrained long-lead-time infrastructure)
Required Experience:
IC
Are you a senior engineer who can keep large AI-augmented systems runningnreliably at Apple scale Apples Stability Engineering team is looking for anseasoned engineer to join our Core team. We build and operate thenplatforms services and infrastructure that turn crash reports from Applendevices into...
Are you a senior engineer who can keep large AI-augmented systems runningnreliably at Apple scale Apples Stability Engineering team is looking for anseasoned engineer to join our Core team. We build and operate thenplatforms services and infrastructure that turn crash reports from Applendevices into actionable engineering insights. Youll work on systems wherenLLMs and agents are already part of the production fabric evolving themnhardening them and using AI tools to extend what a small team can deliver.
Our team owns the end-to-end platform behind stability analysis at Apple:nsymbolication of crash logs across the companys hardware portfolio the datanpipelines that aggregate and cluster crash logs and the applications andnservices that engineers across Apple use every day to drive operating-systemnquality. This role is about keeping that platform healthy extending itndeliberately and making the engineering team itself more effective by usingnAI tools to day youll spend most of your time on the engineering work of runningnreal systems: tuning evaluation infrastructure tightening operationalncontrols improving auditability and debug trails and scaling the workflowsnour analysts rely on. When new capabilities are needed youll prototype andnintegrate them into the platform. Youll partner closely with stabilitynanalysts who are domain experts in OS reliability and with the broader teamnresponsible for symbolication ETL and service infrastructure. Youll alsonbe expected to use AI-assisted development tools fluently to investigatenissues refactor at scale and ship more with a small looking for someone with the rigor of a seasoned production engineernwho is also comfortable operating systems that include LLMs and agents asnfirst-class components. If you enjoy taking responsibility for a complexnalready-running platform and making it steadily better we want to talk.
5 years of professional software engineering experience building and operating production systemsnBS in Computer Science or a related field or equivalent practical experiencenFluent use of AI-assisted development tools (coding agents code review assistants etc.) to work effectively at scalenDemonstrated experience designing and scaling distributed systems (load balancing active-active topologies capacity planning throughput-bound services)nTrack record of maintaining and evolving production services observability operational controls incident response and steady iteration on existing systemsnStrong full-stack instincts; comfortable spanning data infrastructure backend services and the user-facing surfaces that consume themnProven ability to operate independently on ambiguous open-ended problems where the right answer is not obvious
Experience operating LLM- or agent-based features in production environments over timenExperience building or maintaining evaluation harnesses audit trails or replay infrastructure for AI systemsnBackground in developer tools observability crash/stability analysis or other operating-system-quality domainsnFamiliarity with one or more of: Ruby on Rails Python for production servicesnExperience working in environments with significant deferred scalability work (capacity-constrained long-lead-time infrastructure)
Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar
... View more