Senior Staff Applied AI Engineer Context Retrieval

Databricks

Job Location:

San Francisco, CA - USA

Monthly Salary: Not Disclosed

Posted on: 2 days ago

Vacancies: 1 Vacancy

Job Summary

P-1549

At Databricks we are passionate about enabling data teams to solve the worlds toughest problems from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the worlds best data and AI infrastructure platform so our customers can use deep data insights to improve their business.

The Mission

Databricks agents are only as good as the context they can retrieve. Whether an agent is answering a question about last quarters revenue debugging a failing job generating SQL against a 10000-table lakehouse or summarizing a Wiki page its quality is bounded by what it can find and how well it understands what it finds.

We are hiring aSenior Staff Applied AI Engineer to own context retrieval for Databricks agents across SaaS providers. This is a zero-to-one role with two deeply connected charters:

Build the retrieval stack query understanding content understanding ranking retrieval and evaluation across the Enterprise SaaS data stored across multiple systems.
Build the search subagents that sit on top of that stack and reason about what context is needed how to retrieve it and whether the right thing actually came back closing the loop between an agents intent and the substrate that serves it.

If you have deep Information Retrieval wisdom have shipped retrieval systems for RAG and agentic workloads and want to build the substrate and the agents on top of it that make every Databricks agent measurably smarter this role is for you.

What You Will Do

Build the full retrieval stack from scratch. Own the end-to-end system: query understanding content understanding and indexing hybrid retrieval ranking and evaluation. Make the architectural calls that will define how Databricks agents access context for years to come.
Retrieve across heterogeneous data structured and unstructured. Index and rank across structured assets (tables columns SQL queries dashboards code notebooks jobs) and unstructured content (docs wikis tickets chat images video audio). Each modality has its own signals design retrieval that exploits them rather than flattens them.
Connect to the SaaS surface area customers actually use. Build connectors and retrieval adapters for the systems where enterprise knowledge lives. Treat each retrieval source with its own freshness permissions and ranking signals.
Optimize for two consumers at once. Retrieval must serve both LLMs (grounded token-efficient hallucination-resistant context) and humans (intuitive explainable discovery). These are different objectives and require different signals own both.
Crack query understanding for agents. Agent queries dont look like web queries. Build query rewriting decomposition intent classification and entity resolution tuned for multi-turn agentic workflows.
Crack content understanding at scale. Build the pipelines that extract structure entities embeddings summaries and metadata from every supported asset type and keep them fresh as customer data evolves.
Build search subagents that reason about retrieval. Design the agentic layer that decides what context is needed which sources to query how to decompose and route the search and critically whether the retrieved content is actually sufficient to answer the question. These subagents will plan multi-hop searches issue follow-up queries when results are weak ground claims against retrieved evidence and hand back high-confidence context (or signal failure) to upstream agents. This is where IR meets agentic reasoning.
Build the evaluation flywheel for both retrieval and subagents. Stand up offline evals (nDCG MRR ) LLM-as-judge harnesses human-in-the-loop labeling and online experimentation. Extend evaluation beyond ranking metrics to measure subagent decision quality did it ask the right follow-up did it correctly recognize when retrieval failed did it ground its answer in the right evidence. Quality you cant measure is quality you cant ship.
Set technical direction and grow the team. Set the multi-year roadmap mentor senior engineers partner with Research Product and Platform leaders and raise the technical bar across the org.

What Were Looking For

10 years of software engineering experience with significant time spent building production retrieval search or RAG systems at scale.
Deep Information Retrieval (IR) expertise: lexical retrieval (BM25 Lucene/Elasticsearch/OpenSearch) dense retrieval (embeddings ANN indexes FAISS ScaNN HNSW) hybrid retrieval and learning-to-rank.
Hands-on experience with modern LLM-era retrieval: RAG architectures query rewriting re-ranking with cross-encoders long-context strategies and grounding techniques that reduce hallucination.
Experience designing agentic systems on top of retrieval search planners multi-hop / iterative retrieval self-reflection and sufficiency checks tool-using agents that decide what to fetch and verify what came back.
Strong grasp of relevance evaluation: nDCG MRR ; offline/online experimentation; LLM-as-judge frameworks; building human labeling pipelines.
Experience working across structured and unstructured data youve indexed and ranked over tables code and documents in the same system and have opinions about how to do it well.
Track record of building 01: youve stood up a retrieval system from an empty repo made the foundational architectural decisions and grown it into something that customers depend on.
Demonstrated ability to operate as atechnical leader: setting direction across teams mentoring senior engineers and influencing roadmap with research product and platform partners.

Nice to Have

Experience building retrieval over enterprise SaaS sources (permissions freshness multi-tenancy ACL-aware indexing).
Background in agentic systems tool use or multi-turn retrieval for LLM agents.
Contributions to open-source IR/search projects or publications at SIGIR KDD WWW EMNLP or similar venues.
Experience training or fine-tuning embedding models rerankers or query understanding models.

Why This Role

Foundational impact. Retrieval is the single biggest lever on agent quality. The stack you build will sit underneath every Databricks agent and every customer-built agent on our platform.
Greenfield with scale. You get the rare combination of starting from a clean sheet and having immediate access to massive enterprise scale real customer data and a world-class research org.
The right team. Youll work alongside engineers and researchers behind Lakehouse Apache Spark Delta Lake MLflow MosaicML and DBRX.

Location

This role is based in our Mountain View CA or San Francisco CAoffice. Hybrid in-office collaboration expected.

Required Experience:

Staff IC

P-1549At Databricks we are passionate about enabling data teams to solve the worlds toughest problems from making the next mode of transportation a reality to accelerating the development of medical breakthroughs. We do this by building and running the worlds best data and AI infrastructure platfor...