In this role you will work on LLM based question answering and Apple Intelligence features to provide concise accurate and grounded information to users to help them complete their tasks quickly on Apple devices. Your core responsibilities will include:* Designing and developing advanced Reinforcement Learning technologies in the post-training of generative model and delivering the end-user experience.* Driving cross-functional technical initiatives collaborating with research engineering and production teams to translate theoretical advances into deployable systems.* Developing novel and cutting-edge RL algorithms and improving existing ones.* Staying up to date with the latest RL research and integrate best practices into the teams workflow.* Working on the end-to-end ML lifecycle: algorithm design and implementation data collection model training evaluation and deployment.
10 years of ML experiences in search natural language processing/understanding. Conversational AI.
Proven experience for LLM post training including but not limited to SFT RLHF RLAIF Reward Modeling Chain-of-thought agentic LLM.
Hands-on experience building RL pipelines and training agents in simulation or real-world environments.
Growth mindset and ability to learn new technologies
MS or Ph.D. in Computer Science Machine Learning with a specialty in reinforcement learning or a related field
Deep expertise in reinforcement learning-based post-training on LLM models reward modeling RLHF RLAIF Chain-of-thought and agentic AI R&D.
Deep understanding of cutting edge RL algorithms and large language model.
Deep understanding in LLM pre-training post-training.
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.