Our MLPT Cloud Infrastructure Team within Apples AI/ML organization designs builds and scales the foundational systems that power Siri Search and next-generation ML reimagining how infrastructure is managedthrough agentic event-driven workflows Crossplane compositions and self-healing control planesto deliver Model Context Protocol (MCP)based infrastructure servers that integrate seamlessly with ML and data workflows. Youll work closely with AI/ML engineers SREs and platform teams to deliver infrastructure that is automated observable and efficient across Apple-scale hybrid and multi-cloud environments.
- BS/MS in Computer Science or related field (or equivalent practical experience).
- 5 years of experience in distributed systems or cloud infrastructure engineering.
- Strong programming experience in Golang and/or Rust; expertise in building controllers operators or automation systems.
- Deep understanding of Kubernetes internals controller-runtime and Crossplane composition frameworks.
- Experience with ArgoCD Helm and Infrastructure-as-Code (Terraform Pulumi or Crossplane).
- Hands-on experience with GitOps declarative configuration and reconciliation-driven workflows.
- Proven ability to design and operate infrastructure for ML training and inference including performance tuning and GPU optimization.
- Experience leading technical teams driving architecture decisions and mentoring engineers.
- Strong grounding in cloud cost efficiency performance profiling and system-level debugging.
- 9 years in cloud infrastructure SRE or distributed systems roles.
- Active contributor to CNCF open-source projects (e.g. Kubernetes Crossplane ArgoCD Envoy Prometheus).
- Deep expertise in Kubernetes API machinery custom resources (CRDs) and control plane development.
- Experience with Model Context Protocol (MCP)based systems or contextual orchestration servers.
- Familiarity with AIOps or agentic AI workflows in production environments.
- Strong understanding of observability telemetry and distributed tracing (OpenTelemetry Prometheus Grafana).
- Proven experience building ML infrastructure platforms (training clusters inference services model registries).
- Excellent communication technical writing and cross-functional leadership skills.
Required Experience:
Senior IC
Our MLPT Cloud Infrastructure Team within Apples AI/ML organization designs builds and scales the foundational systems that power Siri Search and next-generation ML reimagining how infrastructure is managedthrough agentic event-driven workflows Crossplane compositions and self-healing control plane...
Our MLPT Cloud Infrastructure Team within Apples AI/ML organization designs builds and scales the foundational systems that power Siri Search and next-generation ML reimagining how infrastructure is managedthrough agentic event-driven workflows Crossplane compositions and self-healing control planesto deliver Model Context Protocol (MCP)based infrastructure servers that integrate seamlessly with ML and data workflows. Youll work closely with AI/ML engineers SREs and platform teams to deliver infrastructure that is automated observable and efficient across Apple-scale hybrid and multi-cloud environments.
- BS/MS in Computer Science or related field (or equivalent practical experience).
- 5 years of experience in distributed systems or cloud infrastructure engineering.
- Strong programming experience in Golang and/or Rust; expertise in building controllers operators or automation systems.
- Deep understanding of Kubernetes internals controller-runtime and Crossplane composition frameworks.
- Experience with ArgoCD Helm and Infrastructure-as-Code (Terraform Pulumi or Crossplane).
- Hands-on experience with GitOps declarative configuration and reconciliation-driven workflows.
- Proven ability to design and operate infrastructure for ML training and inference including performance tuning and GPU optimization.
- Experience leading technical teams driving architecture decisions and mentoring engineers.
- Strong grounding in cloud cost efficiency performance profiling and system-level debugging.
- 9 years in cloud infrastructure SRE or distributed systems roles.
- Active contributor to CNCF open-source projects (e.g. Kubernetes Crossplane ArgoCD Envoy Prometheus).
- Deep expertise in Kubernetes API machinery custom resources (CRDs) and control plane development.
- Experience with Model Context Protocol (MCP)based systems or contextual orchestration servers.
- Familiarity with AIOps or agentic AI workflows in production environments.
- Strong understanding of observability telemetry and distributed tracing (OpenTelemetry Prometheus Grafana).
- Proven experience building ML infrastructure platforms (training clusters inference services model registries).
- Excellent communication technical writing and cross-functional leadership skills.
Required Experience:
Senior IC
View more
View less