As part of the Multimodal Intelligence Team (MINT) with a track record of delivering innovations from the Apple Foundation Model to real-world applications like Visual Intelligence you will tackle the practical challenges of scaling optimizing for building large models as well as integrating such models and agents into Apple products. Youll collaborate with world-class engineers and scientists to push the boundaries of foundation models and agentic systems while delivering real-world impact
Currently pursuing a PhD degree or equivalent experience in Machine Learning Computer Vision Natural Language Processing Data Science Statistics or related areas.
Experience with large language models or vision language models and their application in agentic systems.
Proficient programming skills in Python and experience with at least one modern deep learning framework (PyTorch JAX or TensorFlow).
Demonstrated publication record in relevant conferences (e.g. NeurIPS ICML ICLR CVPR etc).
Experience with foundation models (language vision-language or multimodal).
Experience post training (SFT or RL) for optimizing large models for agentic systems.
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.