AI Engineer Multimodal Intelligence

Apple


Job Location:

Sunnyvale, CA - USA

Monthly Salary: Not Disclosed
Posted on: 20 hours ago
Vacancies: 1 Vacancy

Job Summary

Are you excited about the amazing potential of foundation models LLMs and multimodal LLMs We are looking for individuals who thrive on collaboration and have a desire to push the boundaries of what is possible today! The VCV org is a centralized applied research and engineering organization responsible for developing real-time on-device Computer Vision and Machine Perception technologies across Apple the Human Intelligence team we balance research and product to deliver Apple quality pioneering experiences innovating through the full stack and partnering with HW SW and ML teams to influence the sensor and silicon roadmap that brings our vision to us in this truly exciting era of Artificial Intelligence to help deliver the next groundbreaking Apple products u0026 experiences! We are continuously advancing the state of the art in Computer Vision and Machine Learning touching all aspects of multimodal LLMs from data collection data curation to modeling evaluation and deployment. As a member of our dynamic group you will have the unique and rewarding opportunity to craft upcoming research directions in the field of multimodal LLMs that will inspire future Apple products.

We are seeking highly motivated and skilled engineers to join our Human Intelligence team. The ideal candidates will have strong backgrounds in developing and exploring capabilities of foundation models and agentic AI systems that enable natural proactive and personalized human interactions. You will be responsible for multimodal LLM development including training fine-tuning agentic AI and reasoning this role you will work on cutting-edge research and engineering problems collaborating across teams and help shape the technical direction of multimodal and agentic AI systems from research to production. You will lead and contribute to the research roadmap for multimodal foundation models identifying key opportunities for innovation in agentic AI and reasoning capabilities. You will design and implement agentic systems and large-scale simulation and evaluation frameworks that can transition from research prototypes to production-grade technologies.

Develop train and fine-tune multimodal LLMs across image video text and audio modalities from data curation through deployment. nDesign and build video/audio encoders tokenizers and generative models for multimodal understanding and and implement agentic AI systems that enable reliable reasoning for natural proactive and personalized human end-to-end ML systems that transition from research prototypes to production-grade technologies at across HW SW and ML teams to influence sensor and silicon roadmaps and deliver pioneering on-device evaluate and improve ML codebases ensuring correctness efficiency and maintainable engineering to the teams research direction identify opportunities for innovation and help shape product features.

Masters or equivalent practical experience in Computer Science Computer Vision Machine Learning or related technical field.n3 years of relevant academic or industry experience in Machine Learning Computer Vision or Artificial in deep learning with demonstrated work in multimodal systems (e.g. vision language video etc.).nProficiency in Python and in a modern deep learning framework such as PyTorch or with foundation models (language or multimodal) including training fine-tuning and developing training and fine-tuning multimodal foundations in optimization probability and linear algebra as applied to machine learning and computer vision.

PhD or equivalent practical experience in Computer Science Machine Learning Computer Vision or a related technical field with a focus on AI machine learning or computer expertise in developing training and fine-tuning multimodal LLMs at scale and developing industry scale agentic track record of technical leadership including architecting complex ML systems and leading projects from conception to product applying foundation models to build autonomous or semi-autonomous agents including planning task decomposition and multi-step publication record in top-tier venues such as NeurIPS ICML ICLR CVPR ICCV ECCV COLM with large-scale distributed training and model communication skills and ability to present research findings to both technical and non-technical audiences.

Required Experience:

IC

Are you excited about the amazing potential of foundation models LLMs and multimodal LLMs We are looking for individuals who thrive on collaboration and have a desire to push the boundaries of what is possible today! The VCV org is a centralized applied research and engineering organization responsi...

About Company

Company Logo

Ask Siri to name the most successful company in the world and it might respond: Apple. And it's not just out of familial pride. Apple consistently ranks highly in profit, revenue, market capitalization, and consumer cachet. In 2018, the company became the first reach a trillion dollar ... View more

View Profile View Profile