Data Engineer (OCR & Data Pipelines, Contract)

Intelance

Not Interested
Bookmark
Report This Job

profile Job Location:

London - UK

profile Daily Salary: GBP 450 - 750
Posted on: 1 hour ago
Vacancies: 1 Vacancy

Job Summary

Intelance is a specialist architecture and AI consultancy working with clients in regulated high-trust environments (healthcare pharma life sciences financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based UKAS-accredited organisation in human genetic testing.

We are looking for a Data Engineer (OCR & Pipelines) who can turn messy PDFs and documents into clean reliable auditable data flows for ML and downstream systems. This is a contract / freelance role (2-3 days/week) working closely with our AI Solution Architect Lead ML Engineer and Integration Engineer.

Tasks

  • Design and implement the end-to-end data pipeline for the project:

Ingest PDF/Word reports from secure storage

Run OCR / text extraction and layout parsing

Normalise structure and validate the data

Store outputs in a form ready for ML and integration.

  • Evaluate and configure OCR / document AI services (e.g. Azure Form Recognizer or similar) and wrap them in robust retry-safe cost-aware scripts/services.
  • Define and implement data contracts and schemas between ingestion ML and integration components (JSON/Parquet/relational as appropriate).
  • Build quality checks and validation rules (field presence format range checks duplicate detection basic anomaly checks).
  • Implement logging monitoring and lineage so every processed document can be traced from source > OCR > structured output > model input.
  • Work with the ML Engineer to ensure the pipeline exposes exactly the features and metadata needed for training evaluation and explainability.
  • Collaborate with the Integration Engineer to deliver clean batch or streaming feeds into the clients assessment system (API CSV exports or SFTP drop-zone).
  • Follow good security and privacy practices in all pipelines: encryption access control least privilege and redaction where needed.
  • Contribute to infrastructure decisions (storage layout job orchestration simple CI/CD for data jobs).
  • Document the pipeline clearly: architecture diagrams table/field definitions data dictionaries operational runbooks.

Requirements

Must-have

  • 3-5 years of hands-on Data Engineering experience.
  • Strong Python skills including building and packaging data processing scripts or services.
  • Practical experience with OCR / document processing (e.g. Tesseract Azure Form Recognizer AWS Textract Google Document AI or equivalent).
  • Solid experience building ETL / ELT pipelines on a major cloud platform (ideally Azure but AWS/GCP is fine if youre comfortable switching).
  • Good knowledge of data modelling and file formats (JSON CSV Parquet relational schemas).
  • Experience implementing data quality checks logging and monitoring for pipelines.
  • Understanding of security and privacy basics: encryption at rest/in transit access control secure handling of potentially sensitive data.
  • Comfortable working in a small senior remote team; able to take a loosely defined problem and design a clean maintainable solution.
  • Available for 2-3 days per week on a contract basis working largely remotely in UK or close European time zones.

Nice-to-have

  • Experience in healthcare life sciences diagnostics or other regulated environments.
  • Familiarity with Azure Data Factory Azure Functions Databricks or similar orchestration/compute tools.
  • Knowledge of basic MLOps concepts (feature stores model input/output formats).
  • Experience with SFTP-based exchanges and batch integrations with legacy systems.

Benefits

  • Core impact role: you own the pipeline that makes the entire AI solution possible without you nothing moves.
  • Meaningful domain: your work supports external quality assessment in human genetic testing for labs worldwide.
  • Lean senior team: work alongside experienced architects and ML engineers; minimal bureaucracy direct access to decision-makers.
  • Remote-first flexible: work from anywhere compatible with UK hours 2-3 days/week.
  • Contract / freelance: competitive day rate with potential extension into further phases and additional schemes if the pilot is successful.
  • Opportunity to build reusable data pipeline components that Intelance will deploy across future AI engagements.

We review every application personally. If theres a good match well invite you to a short call to walk through the project expectations and next steps.

Intelance is a specialist architecture and AI consultancy working with clients in regulated high-trust environments (healthcare pharma life sciences financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based UKAS-accredited organis...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Company Logo

Intelance is a strategic consultancy specialising in Enterprise Architecture, AI transformation, and cybersecurity. We help organisations design the systems, structures, and operating models needed to scale, secure, and lead in a volatile world. Our team combines TOGAF-based architect ... View more

View Profile View Profile