Our team is comprised of very talented individuals who are passionate about LLM and ensuring Apple services are at their best. As part of the Human-Centered AI team youll play a central role in enhancing the user experience. Responsibilities include:- Collaborate with Engineering Products Research Operations and Editorial teams to evaluate algorithms and AI models powering various features identifying opportunities for improvement.- Build data products (feature datasets analyses models etc.) and scalable tools (typically in Python or Scala) to drive hypothesis generation and support collaborative decision-making with our partner teams in engineering and product management.- Create structured evaluations to assess the quality of AI-generated responses ensuring they align with company standards and customer expectations.- Create evaluation task design and guidelines; identify a relevant data annotation platform to run evaluations at scale.- Implement metrics to measure the effectiveness and accuracy of models to ensure they meet performance standards.- Establish data quality thresholds and reporting on metrics & insights to inform feature business decisions.- Monitor LLM performance in production environments through human evaluations identifying trends and raising alerts when quality degradation occurs. - Perform detailed failure analysis to understand model weaknesses and identify areas for improvement offering actionable insights to engineers- Maintain high standards for data quality and continuously enhance processes based on both quantitative and qualitative feedback
Experience with machine learning concepts including model evaluation metrics and data analysis. Proven data analysis expertise using SQL Python and Tableau to deliver actionable insights
Experience with Large Language Models and evaluation techniques
Fluency in English reading writing and comprehension skills to partner with international teams
Fluent in either Chinese Japanese Hindi Korean French Spanish German to support the language specific market
Cultural understanding of one of the above mentioned language markets to accurately represent user experience in early development cycles
Ability to analyze complex issues and identify potential problems with LLM outputs to improve quality with keen attention to detail
Effective collaboration with cross-functional teams to define ML/LLM evaluation requirements
Experience crafting conducting analyzing and interpreting experiments and investigations
Excellent communication skills
Flexibility to work early morning or late night shift patterns required
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.