Staff Software Engineer — Search Platform, Ingestion & Indexing

Thomson Reuters

Job Location:

Eagan, MN - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

This posting is for proactive recruitment purposes and may be used to fill current openings or future vacancies within our organization.

Overview of the Role

Advanced Content Engineering (ACE) is seeking a Staff Software Engineer to serve as the technical anchor for the search platforms ingestion and indexing systems. The platform processes millions of documents across TRs legal tax and professional content corpora parsing chunking enriching embedding and indexing them into a hybrid search engine that powers both human-facing search interfaces and autonomous AI agents. Getting this pipeline right at scale with zero-downtime operations and increasingly agentic retrieval patterns is one of the platforms most consequential engineering challenges.

This role owns the design implementation and operational health of the document ingestion pipeline and search index management systems from the Kafka-based streaming infrastructure that moves documents through processing stages to the Vespa application architecture that stores and serves them. Staff Engineers on this team define build test deploy scale and operate what they ship full-stack ownership is not a principle we aspire to it is the daily reality. AI-assisted development is the team norm not the exception and constant delivery to production is the expectation. This is a role for someone who sets architectural boundaries not just executes within them

About the Role

In this position you will focus on:

Ingestion Pipeline Architecture & Engineering

Plan design develop and own the end-to-end document ingestion pipeline a Kafka-based stream processing architecture that moves documents through parsing chunking enrichment (entity extraction embedding generation metadata enrichment) and indexing stages including all fault tolerance version ordering and at-least-once delivery guarantees

Architect and implement pluggable configurable pipeline components (parsers chunkers enrichers indexers) that client teams can assemble into custom topologies via the platforms self-service APIs while maintaining reliable observable and performant execution

Own the platforms Protobuf-based document schema and schema registry integration establishing schema governance standards enforcing backward-compatible evolution and ensuring reliable serialization across all pipeline stages

Design and implement dual-flow ingestion: a high-throughput batch path for full reindexing and a low-latency incremental path for real-time document updates with strong guarantees around document version ordering and idempotent processing

Lead the migration of ingestion infrastructure from OpenSearch to Vespa including design of Vespa document processors custom Kafka feeders and application package architecture resolving complex technical challenges that have little or no precedent within the team

Custom Model Operationalization

Own the end-to-end lifecycle for custom models integrated into the ingestion pipeline re-ranking models embedding models and enrichment components including inference serving behind a stable API surface latency SLO management hardware and runtime configuration (batching quantization) and scaling

Build and operate the model promotion pipeline: the CI/CD workflow that moves a model artifact from the fine-tuning team through staging to production including versioning canary rollouts and rollback mechanisms ensuring the platform team can operate model updates independently without depending on the research team for production changes

Define and maintain integration contracts between custom models and downstream pipeline components governing input/output schemas compatibility requirements and the governance process for model updates that ensures search pipeline consumers are not broken by changes upstream

Instrument model serving for production observability: latency distributions throughput error rates and quality signals such as re-ranking score distributions enabling the team to detect regressions or model drift without requiring the fine-tuning teams involvement

Search Engine & Index Management

Own the search engine layer end-to-end: design and operate Vespa (and OpenSearch during transition) index configurations ranking profiles schema definitions and application package lifecycle management applying architectural principles that scale to the platforms long-term content and tenancy goals

Build and operate zero-downtime index management: shadow indexing blue/green index promotion and rolling reindex workflows that keep the platform available during major infrastructure changes

Implement and maintain the Component Registry and Index Registry the platforms catalog of reusable processing components and active index configurations with a focus on correctness observability and safe concurrent modification

Develop the full-reindex and incremental-update orchestration logic including change detection document tracking Kafka topic management and DynamoDB-backed state management

Agentic Search Infrastructure

Design ingestion and indexing infrastructure with agentic retrieval patterns as a first-class concern including explicit latency budgets per retrieval hop chunking and result compression strategies optimized for token economy in context windows and index boundary definitions that give agents clean predictable tool contracts

Build trace-level observability into the retrieval stack that captures which tools were called in what order and with what inputs enabling reliable diagnosis and reproduction of failures in non-deterministic agentic retrieval paths

Design session state and cache invalidation patterns for multi-turn agentic search: reasoning carefully about cache validity windows session state scope (per-user per-session per-query) and mechanisms to prevent stale context from corrupting downstream agent responses

Evaluation & Search Quality

Build and own the integration between the ingestion pipeline and the platforms offline evaluation framework ensuring that experiment runs produce query/result outputs that feed seamlessly into the search grading tool supporting gold test set maintenance LLM-as-judge evaluation and side-by-side ranking comparison across pipeline versions

Instrument the query and retrieval stack for online analytics: real-time query latency and throughput monitoring query log collection for session analysis and the infrastructure to support A/B and interleaved ranking experiments in production generating the signals that connect low-level search metrics to downstream product KPIs

Partner with TR Labs and research scientists to ensure that new search components can be evaluated in isolation with automated offline evaluation on every build and a clear path from evaluation results to production promotion decisions

Reliability & Operational Ownership

Take full operational responsibility for ingestion and indexing infrastructure: define SLOs set measurable goals and meet them build and maintain CloudWatch dashboards and alarms and participate in on-call rotations you built it you own it you run it

Treat delivery friction as the enemy: identify and remove obstacles that slow the teams ability to ship ingestion and indexing changes to production safely and frequently improving CI/CD pipelines deployment automation and local development workflows as a standing priority

Instrument pipeline components with distributed tracing structured logging and rich metrics establishing documentation standards and knowledge

management practices so that the team and platform consumers can understand system behavior at all times

Design and implement resilient fault tolerance mechanisms dead-letter queues retry strategies with exponential backoff circuit breakers consumer lag monitoring that make the pipeline robust to downstream failures and transient errors

Drive system-level performance architecture: profiling ingestion throughput and indexing latency identifying bottlenecks and implementing optimizations that meet platform SLOs under peak load

Technical Leadership

Serve as the teams deepest technical authority on document processing pipelines and search engine internals guiding architectural decisions resolving technical ambiguity and establishing cross-system design patterns that raise the quality bar across the team

Lead significant projects and initiatives that span multiple engineers and interact with other teams; determine work priorities based on strategic direction; recommend modifications to team operations and make needed adjustments to short-term priorities while maintaining strategic focus

Mentor and develop Senior and mid-level engineers providing coaching technical direction and educational opportunities in modern distributed systems stream processing search infrastructure and AI-assisted development practices

Collaborate closely with TR Labs and research scientists to integrate new chunking strategies embedding models and enrichment techniques into the pipeline in a safe well-instrumented and ethically responsible way

Deliver effective presentations on complex technical concepts to both technical and non-technical stakeholders; develop strategic plans for technology implementation that align with business objectives

About You

Youre an ideal fit if you have:

Required Experience

Bachelors or Masters degree in Computer Science Engineering or a related field

8 years of software engineering experience with demonstrated progression to staff-level or equivalent technical leadership including ownership of a functional area and leadership of significant cross-functional projects

Deep expertise in distributed stream processing: designing building and operating high-throughput fault-tolerant event-driven pipelines using Kafka or equivalent technologies at production scale

Production experience with Vespa OpenSearch or Elasticsearch including schema design ranking profile configuration and end-to-end application lifecycle management

Mastery of Python with strategic awareness of language and framework selection; strong software engineering fundamentals including test strategy performance architecture and system design

Proficiency with AWS cloud services used in data pipeline and search infrastructure (MSK ECS Lambda DynamoDB Step Functions CloudWatch) with infrastructure-as-code experience (Terraform or AWS CDK)

Demonstrated ability to take full operational responsibility end-to-end defining SLOs building observability running on-call and driving systematic improvements from incident retrospectives with a track record of shipping to production frequently and removing delivery friction proactively

Comfort and fluency with AI-assisted development tools; you use them to move faster and produce higher-quality work not as a novelty

Track record of establishing architectural principles cross-system design patterns and documentation standards that improve the broader teams engineering quality

Preferred Experience

Experience operationalizing ML models in production: inference serving model promotion pipelines canary rollouts and production observability for model quality signals

Familiarity with agentic retrieval patterns multi-hop retrieval latency budget management across retrieval hops context window optimization and stateful session design

Experience with online search analytics: instrumenting systems for query performance monitoring A/B or interleaved ranking experiments and query log analysis to surface relevance gaps

Experience with embedding pipelines vector indexing and hybrid (dense sparse) retrieval architectures in a production context

Familiarity with Protobuf schema design and schema registry governance patterns (Confluent Schema Registry or equivalent)

Experience building self-service or multi-tenant platform infrastructure where reliability and correctness directly affect multiple downstream teams

Background in AI ethics frameworks and responsible deployment of machine learning components in production pipelines

What Success Looks Like

In the first 90 days:

Develop a thorough understanding of the platforms current ingestion and indexing architecture active technical debt known reliability gaps and the roadmap for Vespa adoption

Establish strong working relationships with the search platform team TR Labs and key client teams consuming the ingestion pipeline

Take on-call ownership for your functional area and deliver at least one meaningful improvement to pipeline reliability observability or delivery automation

In the first year:

Lead the architectural design and delivery of a major phase of the Vespa migration including ingestion pipeline changes schema migration and zero-downtime index promotion resolving novel technical challenges with minimal precedent

Establish robust SLO coverage and observability across ingestion components with on-call playbooks documented architectural decision records and demonstrated improvement in incident response quality

Deliver a production-ready custom model operationalization framework: inference serving promotion pipeline and observability for at least one custom model integrated into the ingestion or query stack

Become the recognized technical authority for ingestion and indexing the person the team and partner organizations turn to for architectural direction in this domain with demonstrated influence on platform strategy.

#LI-TH1

Whats in it For You

Hybrid Work Model: Weve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected.
Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities whether caring for family giving back to the community or finding time to refresh and reset. This builds upon our flexible work arrangements including work from anywhere for up to 8 weeks per year empowering employees to achieve a better work-life balance.
Career Development and Growth: By fostering a culture of continuous learning and skill development we prepare our talent to tackle tomorrows challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow lead and thrive in an AI-enabled future.
Industry Competitive Benefits: We offer comprehensive benefit plans to include flexible vacation two company-wide Mental Health Days off access to the Headspace app retirement savings tuition reimbursement employee incentive programs and resources for mental physical and financial wellbeing.
Culture: Globally recognized award-winning reputation for inclusion and belonging flexibility work-life balance and more. We live by our values: Obsess over our Customers Compete to Win Challenge (Y)our Thinking Act Fast / Learn Fast and Stronger Together.
Social Impact: Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental Social and Governance (ESG) initiatives.
Making a Real-World Impact:We are one of the few companies globally that helps its customers pursue justice truth and transparency. Together with the professionals and institutions we serve we help uphold the rule of law turn the wheels of commerce catch bad actors report the facts and provide trusted unbiased information to people all over the world.

Our use of AI within the recruitment process Thomson Reuters utilizes Artificial Intelligence (AI) to support parts of our global recruitment process. Unless you opt-out our AI system will assess the information provided by you and compare it to the requirements listed for the role and present the result to our recruitment personnel for further review. The AI system acts as a supporting tool but there is always a human making the decision if you will be considered for the role.

In the United States Thomson Reuters offers a comprehensive benefits package to our employees. Our benefit package includes market competitive health dental vision disability and life insurance programs as well as a competitive 401k plan with company addition Thomson Reuters offers market leading work life benefits with competitive vacation sick and safe paid time off paid holidays (including two company mental health days off) parental leave sabbatical leave. These benefits meet or exceeds the requirements of paid time off in accordance with any applicable state or municipal laws. Finally Thomson Reuters offers the following additional benefits: optional hospital accident and sickness insurance paid 100% by the employee; optional life and AD&D insurance paid 100% by the employee; Flexible Spending and Health Savings Accounts; fitness reimbursement; access to Employee Assistance Program; Group Legal Identity Theft Protection benefit paid 100% by employee; access to 529 Plan; commuter benefits; Adoption & Surrogacy Assistance; Tuition Reimbursement; and access to Employee Stock Purchase Plan.

Thomson Reuters complies with local laws that require upfront disclosure of the expected pay range for a position. The base compensation range varies across locations. Eligible office location(s) for this role include one or more of the following: New York City San Francisco Los Angeles and/or Irvine CA; McLean VA; Washington DC. The base compensation range for the role in any of those locations is $136000 USD - $253000 USD. For any eligible US locations unless otherwise noted the base compensation range for this role is $118400 USD - $219800 USD. For Ontario Canada the base compensation range for this role is $140600 CAD - $190600 CAD. Base pay is positioned within the range based on several factors including an individuals knowledge skills and experience with consideration given to internal equity. Base pay is one part of a comprehensive Total Reward program which also includes flexible and supportive benefits and other wellbeing programs. This role may also be eligible for an Annual Bonus based on a combination of enterprise and individual performance.

About Us

Thomson Reuters informs the way forward by bringing together the trusted content and technology that people and organizations need to make the right decisions. We serve professionals across legal tax accounting compliance government and media. Our products combine highly specialized software and insights to empower professionals with the data intelligence and solutions needed to make informed decisions and to help institutions in their pursuit of justice truth and transparency. Reuters part of Thomson Reuters is a world leading provider of trusted journalism and news.

We are powered by the talents of 26000 employees across more than 70 countries where everyone has a chance to contribute and grow professionally in flexible work environments. At a time when objectivity accuracy fairness and transparency are under attack we consider it our duty to pursue them. Sound exciting Join us and help shape the industries that move society forward.

As a global business we rely on the unique backgrounds perspectives and experiences of all employees to deliver on our business goals. To ensure we can do that we seek talented qualified employees in all our operations around the world regardless of race color sex/gender including pregnancy gender identity and expression national origin religion sexual orientation disability age marital status citizen status veteran status or any other protected classification under applicable law. Thomson Reuters is proud to be an Equal Employment Opportunity Employer providing a drug-free workplace.

Thomson Reuters makes reasonable accommodations for applicants with disabilities including veterans with disabilities and for sincerely held religious beliefs in accordance with applicable law. If you reside in the United States and require an accommodation in the recruiting process you may contact our Human Resources Department at. Disability accommodations in the recruiting process may include things like a sign language interpreter making interview rooms accessible providing assistive technology or other relevant accommodations. Please note this email is not intended for general recruitment questions and we will promptly respond to inquiries regarding accommodations. More information on requesting an accommodation here.

Learn more on how to protect yourself from fraudulent job postings here.

More information about Thomson Reuters can be found on

Required Experience:

Staff IC

This posting is for proactive recruitment purposes and may be used to fill current openings or future vacancies within our organization.Overview of the RoleAdvanced Content Engineering (ACE) is seeking a Staff Software Engineer to serve as the technical anchor for the search platforms ingestion and ...