Senior Staff Machine Learning Engineer, Data & Eval

San Francisco, CA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

The Community You Will Join:

AI and ML are at the heart of the Airbnb product. From Trust to Payments and from Customer Service to Marketing we rely on ML to ensure that guests and hosts have the best possible experience with Airbnb.

The Core ML team is responsible for driving CSxAI (Customer Support x Artificial Intelligence) initiatives by adopting Generative AI technologies to enable an intelligent scalable and exceptional service experience. The team develops and enhances AI models ML services and tools including LLM fine-tuning and optimization RAG/Search LLM evaluation and testing automation feedback-based learning and guardrails for a wide range of applications at Airbnb.

The richness of Airbnbs data the complexity of its marketplace and the variety innate in our product mean that we need to operate at the state of the art of AI practice. We are committed to long-term innovation to solve complex problems and to do that we need experienced ML leaders to join us.

The Difference You Will Make:

In this Senior Staff role you will set technical direction and lead execution for ML evaluation and the end-to-end data flywheel powering CSxAI products (e.g. assistive agents issue resolution and tooling). Your work will define how we measure quality how we turn feedback into learning signals and how we continuously improve models and products safely and efficiently. You will partner closely with product engineering design operations to build evaluation systems that are trusted scalable and actionable - connecting offline metrics to online outcomes.

A Typical Day:

Work with large scale structured and unstructured data; explore experiment build and continuously improve Machine Learning models and pipelines for Airbnb product business and operational use cases.
Work collaboratively with cross-functional partners including product managers operations and data scientists to identify opportunities for business impact; understand refine and prioritize requirements for machine learning and drive engineering decisions.
Hands-on develop productionize and operate Machine Learning models and pipelines at scale including both batch and real-time use cases.
Leverage third-party and in-house Machine Learning tools & infrastructure to develop reusable highly differentiating and high-performing Machine Learning systems enable fast model development low-latency serving and ease of model quality upkeep.

Your Expertise:

Define evaluation strategy and success metrics for GenAI systems aligning offline evaluation with online business and customer experience outcomes.
Build and scale evaluation frameworks (golden sets synthetic data automated regressions rubric-based grading LLM-as-judge where appropriate) with strong controls for bias drift and reliability.
Design the data flywheel: instrumentation feedback collection data quality checks labeling strategy dataset versioning and governance to support continuous improvement.
Lead cross-functional quality initiatives across product ops and engineering driving clarity on what good looks like and how teams act on evaluation results.
Develop and productionize pipelines for dataset creation model monitoring evaluation-at-scale and continuous testing (pre-deploy and post-deploy).
Drive technical decisions and architecture for evaluation and data infrastructure balancing speed rigor cost and safety.

Minimum Qualifications:

Educational Background: PhD in Computer Science Mathematics Statistics or related technical field (or equivalent practical experience).
Industry Experience: 10 years building testing and shipping ML/AI systems end-to-end; including 2 years of experience with GenAI/LLM systems in production.
Leadership Experience: 5 years leading large ambiguous technical initiatives as a senior IC influencing roadmap and engineering/science direction across teams.
Technical Proficiency:

Deep expertise in evaluation methodology (offline/online alignment metric design human-in-the-loop evaluation A/B testing power analysis regression testing).
Hands-on experience with GenAI systems including orchestration retrieval tool calling memory etc.
Experience building data pipelines and quality systems (labeling workflows dataset curation versioning monitoring and governance).
Solid ML fundamentals and best practices (model selection training/serving monitoring reliability and model lifecycle management).

Preferred Qualifications:

Customer Support Systems: Experience applying ML/AI to customer support workflows (e.g. agent assist classification/routing resolution recommendation QA).
Infrastructure & Quality at Scale: Experience building robust evaluation platforms for agent behavior validation safety/guardrails and continuous improvement.
Agile Practice for Applied AI: Proven ability to take evaluation and data flywheel work from incubation to production iterating quickly while maintaining scientific rigor.
Continuous Learner: Strong curiosity and ability to absorb new techniques (e.g. judge models preference optimization synthetic data generation) and apply them pragmatically.

Your Location:

This position is US - Remote Eligible. The role may include occasional work at an Airbnb office or attendance at offsites as agreed to with your manager. While the position is Remote Eligible you must live in a state where Airbnb Inc. has a registered herefor the up-to-date list of excluded states. This list is continuously evolving so please check back with us if the state you live in is on the exclusion list. If your position is employed by another Airbnb entity your recruiter will inform you what states you are eligible to work from.

Required Experience:

Staff IC

The Community You Will Join:AI and ML are at the heart of the Airbnb product. From Trust to Payments and from Customer Service to Marketing we rely on ML to ensure that guests and hosts have the best possible experience with Airbnb.The Core ML team is responsible for driving CSxAI (Customer Support ...