About You
Are you excited about turning messy multi-jurisdiction legal content into clean structured and AI-ready data Do you enjoy building reliable pipelines for extraction normalization chunking citation handling tagging structuring summarizing and indexing; then measuring quality and cost Do you thrive in a fast-paced startup where your work directly powers search AI answer quality and analytics If so wed love to hear from you!
About Omnilex
Omnilex is a young dynamic AI legal tech startup with its roots at ETH Zurich. Our passionate interdisciplinary team of 10 people is dedicated to empowering legal professionals in law firms and legal teams by leveraging the power of AI for legal research and answering complex legal questions. We already stand out with our strong data engineering including our combination of external data customer-internal data and our own innovative AI-first legal commentaries.
Tasks
Your Responsibilities
- As a Data Engineer focused on AI data processing & integration your primary focus will be building and owning data flows that make our AI features accurate explainable and scalable.
- Design and maintain ingestion for legal sources (APIs scraping bulk data) across jurisdictions with strong reliability and compliance
- Normalize and model heterogeneous sources into pragmatic typed schemas (statutes decisions commentaries citations metadata)
- Implement citation-aware chunking sectioning and cross-referencing so RAG is precise traceable and cost-efficient
- Build enrichment pipelines for tagging classification summarization embeddings entity extraction and graph relationships; using AI where it helps
- Improve search quality via better indexing strategies analyzers synonyms ranking and relevance evaluation
- Establish data quality lineage and observability (QA checks coverage metrics regression tests versioning)
- Optimize performance runtime complexity DB query times token usage and overall pipeline cost
- Collaborate closely with users and customers to translate user problems and company requirements into robust data and SLAs
- Communicate your work and findings to the team for continuous feedback and improvement (in English)
Requirements
Minimum qualifications
- Degree in Computer Science Data Science or a related field; or equivalent practical experience
- Strong hands-on experience in data engineering with TypeScript
- Solid grasp of data structures algorithms regexes and SQL (PostgreSQL)
- Experience using LLMs/embeddings for practical data tasks (chunking tagging summarization RAG-ready pipelines)
- Ability to learn quickly and adapt to a dynamic startup environment with strong ownership and product mindset
- Availability full-time. On-site in Zurich at least two days per week (hybrid).
Preferred qualifications
- You have a Swiss work permit or EU/EFTA citizenship
- Working proficiency in German (much of our legal data is in German) and proficiency in English
- Experience with Azure (incl. Azure AI/Cognitive Search) Docker and CI/CD
- Familiar with modern scraping/parsing stacks (Playwright/Puppeteer PDF tooling OCR)
- Experience with vector indexing relevance evaluation and search ranking
- Familiar with our stack: Azure / NestJS /
- Knowledge and experience with legal systems in particular Switzerland Germany USA
Benefits
Benefits
- Direct impact: your pipelines immediately improve search answers and user trust transforming legal research
- Autonomy & ownership: Own across ingestion processing enrichment and indexing
- Team: Professional growth at the intersection of legal data and AI with an interdisciplinary team
- Compensation: CHF per month ESOP (employee stock options) depending on experience and skills.
Were excited to hear from candidates who are passionate about data engineering and eager to make an impact in the legal tech space. Apply today by pressing on the Apply button.
About YouAre you excited about turning messy multi-jurisdiction legal content into clean structured and AI-ready data Do you enjoy building reliable pipelines for extraction normalization chunking citation handling tagging structuring summarizing and indexing; then measuring quality and cost Do you...
About You
Are you excited about turning messy multi-jurisdiction legal content into clean structured and AI-ready data Do you enjoy building reliable pipelines for extraction normalization chunking citation handling tagging structuring summarizing and indexing; then measuring quality and cost Do you thrive in a fast-paced startup where your work directly powers search AI answer quality and analytics If so wed love to hear from you!
About Omnilex
Omnilex is a young dynamic AI legal tech startup with its roots at ETH Zurich. Our passionate interdisciplinary team of 10 people is dedicated to empowering legal professionals in law firms and legal teams by leveraging the power of AI for legal research and answering complex legal questions. We already stand out with our strong data engineering including our combination of external data customer-internal data and our own innovative AI-first legal commentaries.
Tasks
Your Responsibilities
- As a Data Engineer focused on AI data processing & integration your primary focus will be building and owning data flows that make our AI features accurate explainable and scalable.
- Design and maintain ingestion for legal sources (APIs scraping bulk data) across jurisdictions with strong reliability and compliance
- Normalize and model heterogeneous sources into pragmatic typed schemas (statutes decisions commentaries citations metadata)
- Implement citation-aware chunking sectioning and cross-referencing so RAG is precise traceable and cost-efficient
- Build enrichment pipelines for tagging classification summarization embeddings entity extraction and graph relationships; using AI where it helps
- Improve search quality via better indexing strategies analyzers synonyms ranking and relevance evaluation
- Establish data quality lineage and observability (QA checks coverage metrics regression tests versioning)
- Optimize performance runtime complexity DB query times token usage and overall pipeline cost
- Collaborate closely with users and customers to translate user problems and company requirements into robust data and SLAs
- Communicate your work and findings to the team for continuous feedback and improvement (in English)
Requirements
Minimum qualifications
- Degree in Computer Science Data Science or a related field; or equivalent practical experience
- Strong hands-on experience in data engineering with TypeScript
- Solid grasp of data structures algorithms regexes and SQL (PostgreSQL)
- Experience using LLMs/embeddings for practical data tasks (chunking tagging summarization RAG-ready pipelines)
- Ability to learn quickly and adapt to a dynamic startup environment with strong ownership and product mindset
- Availability full-time. On-site in Zurich at least two days per week (hybrid).
Preferred qualifications
- You have a Swiss work permit or EU/EFTA citizenship
- Working proficiency in German (much of our legal data is in German) and proficiency in English
- Experience with Azure (incl. Azure AI/Cognitive Search) Docker and CI/CD
- Familiar with modern scraping/parsing stacks (Playwright/Puppeteer PDF tooling OCR)
- Experience with vector indexing relevance evaluation and search ranking
- Familiar with our stack: Azure / NestJS /
- Knowledge and experience with legal systems in particular Switzerland Germany USA
Benefits
Benefits
- Direct impact: your pipelines immediately improve search answers and user trust transforming legal research
- Autonomy & ownership: Own across ingestion processing enrichment and indexing
- Team: Professional growth at the intersection of legal data and AI with an interdisciplinary team
- Compensation: CHF per month ESOP (employee stock options) depending on experience and skills.
Were excited to hear from candidates who are passionate about data engineering and eager to make an impact in the legal tech space. Apply today by pressing on the Apply button.
View more
View less