Join the team shaping the data foundation and intelligence for Apples frontier foundation models. We believe that breakthrough AI capabilities are driven not only by model architecture and scale but by the quality diversity and intelligence of the data used to train them. As part of the Apple Foundation Model team you will help define how next-generation foundation models learn reason plan and interact with the world powering intelligent experiences used by billions of people. This is a rare opportunity to work at the intersection of cutting-edge AI research large-scale training and data systems and impactful consumer products.
s a member of Apples Foundation Models team you will develop the data strategies pipelines and methodologies that drive model capability across the full training lifecycle including pre-training mid-training and post-training. You will work closely with researchers engineers and product teams to identify capability gaps design data-centric solutions and create high-quality training signals for reasoning agentic behavior multimodal understanding tool use and alignment. Your work may span large-scale data curation synthetic data generation data recipe development model ablation benchmark-driven optimization reward modeling evaluation systems and data flywheels that continuously improve model performance. Every dataset evaluation and insight you contribute will directly influence the capabilities of the foundation models powering Apples next generation of intelligent experiences.
Drive data strategy and mixture design across the foundation model training lifecycle including pre-training mid-training and and build scalable data generation curation and quality assessment systems for text multimodal reasoning and agentic training synthetic data pipelines that enable models to learn complex capabilities such as reasoning planning coding tool use and multimodal model self-improvement and self-iteration frameworks that leverage foundation models to generate evaluate refine and evolve their own training data and data flywheels that transform model feedback evaluations and user interactions into high-quality training signals for continual capability benchmark-driven methodologies to identify capability gaps diagnose failure modes and translate insights into targeted data frontier capabilities in reasoning agentic systems alignment and long-horizon task state-of-the-art techniques in data-centric AI including reward modeling preference learning model self-evolution and scalable alignment for foundation models.
Demonstrated expertise in LLM or Multi-modal LLM with a publication record in relevant conferences (e.g. NeurIPS ICML ICLR CVPR ICCV ECCV KDD ACL ICASSP InterSpeech) or a track record in applying deep learning techniques to productsnProficient programming skills in Python and one of the deep learning toolkits such as JAX PyTorch or TensorflownAbility to work in a collaborative environmentnPh.D. in Computer Science Machine Learning Artificial Intelligence or a related technical field or equivalent practical experience.
Experience developing data-centric solutions for foundation models especially large-scale data improving foundation models using user interaction data private data or other real-world feedback signals while maintaining strong privacy and data governance building agentic systems tool-use capabilities and reasoning with model self-improvement developing or improving multimodal foundation models across text vision audio and video.
Required Experience:
IC
Join the team shaping the data foundation and intelligence for Apples frontier foundation models. We believe that breakthrough AI capabilities are driven not only by model architecture and scale but by the quality diversity and intelligence of the data used to train them. As part of the Apple Founda...
Join the team shaping the data foundation and intelligence for Apples frontier foundation models. We believe that breakthrough AI capabilities are driven not only by model architecture and scale but by the quality diversity and intelligence of the data used to train them. As part of the Apple Foundation Model team you will help define how next-generation foundation models learn reason plan and interact with the world powering intelligent experiences used by billions of people. This is a rare opportunity to work at the intersection of cutting-edge AI research large-scale training and data systems and impactful consumer products.
s a member of Apples Foundation Models team you will develop the data strategies pipelines and methodologies that drive model capability across the full training lifecycle including pre-training mid-training and post-training. You will work closely with researchers engineers and product teams to identify capability gaps design data-centric solutions and create high-quality training signals for reasoning agentic behavior multimodal understanding tool use and alignment. Your work may span large-scale data curation synthetic data generation data recipe development model ablation benchmark-driven optimization reward modeling evaluation systems and data flywheels that continuously improve model performance. Every dataset evaluation and insight you contribute will directly influence the capabilities of the foundation models powering Apples next generation of intelligent experiences.
Drive data strategy and mixture design across the foundation model training lifecycle including pre-training mid-training and and build scalable data generation curation and quality assessment systems for text multimodal reasoning and agentic training synthetic data pipelines that enable models to learn complex capabilities such as reasoning planning coding tool use and multimodal model self-improvement and self-iteration frameworks that leverage foundation models to generate evaluate refine and evolve their own training data and data flywheels that transform model feedback evaluations and user interactions into high-quality training signals for continual capability benchmark-driven methodologies to identify capability gaps diagnose failure modes and translate insights into targeted data frontier capabilities in reasoning agentic systems alignment and long-horizon task state-of-the-art techniques in data-centric AI including reward modeling preference learning model self-evolution and scalable alignment for foundation models.
Demonstrated expertise in LLM or Multi-modal LLM with a publication record in relevant conferences (e.g. NeurIPS ICML ICLR CVPR ICCV ECCV KDD ACL ICASSP InterSpeech) or a track record in applying deep learning techniques to productsnProficient programming skills in Python and one of the deep learning toolkits such as JAX PyTorch or TensorflownAbility to work in a collaborative environmentnPh.D. in Computer Science Machine Learning Artificial Intelligence or a related technical field or equivalent practical experience.
Experience developing data-centric solutions for foundation models especially large-scale data improving foundation models using user interaction data private data or other real-world feedback signals while maintaining strong privacy and data governance building agentic systems tool-use capabilities and reasoning with model self-improvement developing or improving multimodal foundation models across text vision audio and video.
Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar
... View more