Data Engineer

Cambridge - UK

Monthly Salary: GBP 45000 - 65000

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About Us

We are a leading gastrointestinal health company delivering minimally invasive diagnostics to transform access to esophageal care. OurEndoSign testcombines a simple swallowable device with cutting-edge laboratory biomarkers and analytics to detect esophageal cancer and its precursor Barretts esophagus.

Operating across the US and UK life-science hub with hybrid remote and onsite teams we are expanding our pipeline to address new high-impact targets across gastroenterology and related fields. Youll join a close-knit team of experts in our field who collaborate daily to translate breakthrough ideas into real-world solutions.

AtCyted Health every voice matters. Whether youre in R&D Commercialisation Medical Affairs or Operations youll have the chance to lead projects influence strategy and broaden your skill set across the company. We champion diverse backgrounds and perspectives fostering an inclusive culture where everyone can thrive and innovate.

If youre inspired by purpose motivated by challenge and eager to make a meaningful impact on patient lives wed love to hear from you. We usually recruit on a rolling basis:

1. Initial Conversation An online meeting with a member of our People team or the hiring manager to learn about your skills & experiences and for you to explore what it is like to work with us.
2. Team Interview & Assessment Meet the wider team sometimes accompanied by an assessment such as giving a short presentation based on a topic related to the role.
3. Final Interview An online meeting with our CEO to discuss your goals and the companys history and vision.

Job Summary

As a Data Engineer (Bioinformatics) at Cyted youll build the data infrastructure that powers our diagnostics and research. Youll transform experimental workflows into reliable production-grade data pipelines implementing reproducible ingestion and analysis processes (primarily using Nextflow) and developing automation and orchestration for both operational and research workloads.
Youll establish strong data governance and observability practices ensuring datasets are versioned catalogued and fully traceable from source to output. Security and compliance will be embedded in everything you design meeting the standards required for regulated healthcare and diagnostics environments.
Youll work closely with computational biologists in R&D and software engineers in the Technology team to translate scientific and product requirements into scalable maintainable solutions. Alongside delivery youll maintain clear technical documentation contribute to code reviews and help raise engineering standards across the team.

Working Pattern and Location

The role is a full-time position with a standard 37.5 hour working week. The role holder may be required to work flexibly.

The Data Engineer (Bioinformatics) will be based at Cyteds Head Office Ground Floor Building 3 Old Swiss 149 Cherry Hinton Road Cambridge United Kingdom CB1 7BX.

What you will be doing

Pipeline Design and Development

Build maintain and optimise scalable data ingestion and analysis pipelines using workflow engines such as Nextflow.
Translate scientific and analytical prototypes into robust reproducible and automated workflows suitable for production use.
Create modular testable components and establish clear versioning to ensure reproducibility across environments.

Data Architecture and Governance

Design and maintain data models storage solutions and metadata catalogues that support efficient querying and lineage tracking.
Implement and enforce data governance practices including data classification retention policies and access control frameworks.
Maintain comprehensive lineage tracking (e.g. with OpenLineage or equivalent) and ensure auditability of all datasets.

Automation Monitoring and Reliability

Develop orchestration and scheduling frameworks to automate both operational and R&D pipelines.
Implement observability practices monitoring alerting and automated recovery to ensure high reliability and performance.
Drive continuous improvement in efficiency scalability and cost optimisation of data workflows across AWS/GCP/Azure.

Security and Compliance

Embed security-by-design principles into all data handling including encryption authentication and secrets management.
Ensure all pipelines and data stores comply with regulatory requirements relevant to diagnostics and healthcare (e.g. ISO27001 ISO13485 CLIA/CAP GDPR).
Contribute to technical documentation and evidence for audits and certification processes.

Collaboration and Communication

Partner with computational biologists and product engineers to define data requirements and shape infrastructure decisions.
Provide technical mentorship and guidance to team members on data engineering best practices.
Document systems and processes through runbooks design specifications and operational guides.
Contribute to code reviews internal knowledge-sharing sessions and cross-functional project planning.

Innovation and Continuous Improvement

Evaluate and integrate new technologies to improve data processing observability and scalability.
Identify and remove bottlenecks in the data lifecycle from ingestion to reporting to accelerate insight generation.
Support the adoption of modern DevOps and MLOps approaches for scientific and product data pipelines.

How we work

At Cyted how we work is just as important as what were building. Our values shape how we collaborate innovate and deliver for patients and partners. As our Data Engineer youll bring these values to life from day one.

Wecare deeply about data integrity patient outcomes and the clinicians who rely on our this role care means building systems that are accurate traceable and resilient - because real people depend on the results we generate. Youll take pride in clean code reproducible pipelines and the knowledge that every dataset you shape contributes to earlier better diagnosis.

We expect you toown the work and contributions to your functions with confidence and curiosity. Youll be responsible for designing and maintaining the infrastructure that connects our science operations and technology. Youll take initiative move with purpose and be trusted to make critical decisions that keep our data ecosystem secure scalable and compliant.

Weaim high. Were scaling fast working across complex regulated environments and pushing boundaries in how data accelerates diagnostics. Youll be empowered to build with ambition - optimising workflows streamlining automation and helping define what great data engineering looks like in healthcare.

Youll be expected todive deepinto the science the systems and the standards. Youll understand the technical and regulatory nuance behind every workflow and youll be just as comfortable debugging a Nextflow pipeline as you are explaining architecture decisions to cross-functional teams. You wont just maintain systems youll actively improve them.

We encourage everyone tochallenge and commit. Youll help shape how we work as a data-led company questioning assumptions sharing ideas and being open to better ways. But once we align youll deliver with clarity ownership and precision.

And most of all wedeliver. This is a role for someone who thrives on progress who builds with intent and sees impact in every successful workflow run every insight delivered and every patient outcome improved.

This is how we work at Cyted and if this sounds like the environment where youll do your best work wed love to speak with you.

Person Specification

Were looking for a skilled proactive Data Engineer whos ready to build and scale the infrastructure that powers our scientific and operational insights. The ideal candidate will bring experience working with complex regulated datasets a strong grasp of modern data engineering tools and best practices and the curiosity to solve problems at the intersection of biology and technology. Youll be hands-on adaptable and motivated to design systems that are reliable compliant and built to grow in a fast-paced purpose-driven environment.
To succeed in this role youll bring:

A degree in Computer Science Bioinformatics Computational Biology or a related fieldor equivalent practical experience
23 years of industry experience working in a regulated data environment (e.g. biotech healthtech or clinical diagnostics)
Proven experience designing and maintaining reliable data pipelines on AWS GCP or Azure
Strong proficiency in Python with solid Linux/Bash fundamentals
Hands-on experience with at least one workflow engine (e.g. Nextflow Snakemake)
Familiarity with version control systems (Git GitHub) and CI/CD best practices
Working knowledge of regulated frameworks (CLIA CAP IVD ISO27001 ISO13485) and audit readiness requirements
Understanding of NGS data associated tools and standard QC practices
Experience with data cataloging and governance platforms (e.g. DataHub) lineage tracking (e.g. OpenLineage) and access control management
Knowledge of Infrastructure-as-Code (e.g. Terraform) identity and secrets management (IAM) and cloud cost optimization at scale
Exposure to the R programming language and genomics workflows such as RNAseq single-cell or structural variant/CNV pipelines
A strong focus on testing monitoring and observability to ensure data integrity and reliability
Clear concise communication and a collaborative approach to problem-solving

Benefits

25 days holiday per holiday year plus public holidays
Pension scheme
An annual learning and development budget
Medical insurance including dental and optical cover
Life/critical illness cover
Social events including Christmas and Summer parties
Cycle to work scheme
Electric Vehicle Scheme
Sabbatical 4 years of service

About UsWe are a leading gastrointestinal health company delivering minimally invasive diagnostics to transform access to esophageal care. OurEndoSign testcombines a simple swallowable device with cutting-edge laboratory biomarkers and analytics to detect esophageal cancer and its precursor Barretts...

About Us

Job Summary

Working Pattern and Location

What you will be doing

Pipeline Design and Development

Build maintain and optimise scalable data ingestion and analysis pipelines using workflow engines such as Nextflow.
Translate scientific and analytical prototypes into robust reproducible and automated workflows suitable for production use.
Create modular testable components and establish clear versioning to ensure reproducibility across environments.

Data Architecture and Governance

Design and maintain data models storage solutions and metadata catalogues that support efficient querying and lineage tracking.
Implement and enforce data governance practices including data classification retention policies and access control frameworks.
Maintain comprehensive lineage tracking (e.g. with OpenLineage or equivalent) and ensure auditability of all datasets.

Automation Monitoring and Reliability

Develop orchestration and scheduling frameworks to automate both operational and R&D pipelines.
Implement observability practices monitoring alerting and automated recovery to ensure high reliability and performance.
Drive continuous improvement in efficiency scalability and cost optimisation of data workflows across AWS/GCP/Azure.

Security and Compliance

Embed security-by-design principles into all data handling including encryption authentication and secrets management.
Ensure all pipelines and data stores comply with regulatory requirements relevant to diagnostics and healthcare (e.g. ISO27001 ISO13485 CLIA/CAP GDPR).
Contribute to technical documentation and evidence for audits and certification processes.

Collaboration and Communication

Partner with computational biologists and product engineers to define data requirements and shape infrastructure decisions.
Provide technical mentorship and guidance to team members on data engineering best practices.
Document systems and processes through runbooks design specifications and operational guides.
Contribute to code reviews internal knowledge-sharing sessions and cross-functional project planning.

Innovation and Continuous Improvement

Evaluate and integrate new technologies to improve data processing observability and scalability.
Identify and remove bottlenecks in the data lifecycle from ingestion to reporting to accelerate insight generation.
Support the adoption of modern DevOps and MLOps approaches for scientific and product data pipelines.

How we work

Person Specification

A degree in Computer Science Bioinformatics Computational Biology or a related fieldor equivalent practical experience
23 years of industry experience working in a regulated data environment (e.g. biotech healthtech or clinical diagnostics)
Proven experience designing and maintaining reliable data pipelines on AWS GCP or Azure
Strong proficiency in Python with solid Linux/Bash fundamentals
Hands-on experience with at least one workflow engine (e.g. Nextflow Snakemake)
Familiarity with version control systems (Git GitHub) and CI/CD best practices
Working knowledge of regulated frameworks (CLIA CAP IVD ISO27001 ISO13485) and audit readiness requirements
Understanding of NGS data associated tools and standard QC practices
Experience with data cataloging and governance platforms (e.g. DataHub) lineage tracking (e.g. OpenLineage) and access control management
Knowledge of Infrastructure-as-Code (e.g. Terraform) identity and secrets management (IAM) and cloud cost optimization at scale
Exposure to the R programming language and genomics workflows such as RNAseq single-cell or structural variant/CNV pipelines
A strong focus on testing monitoring and observability to ensure data integrity and reliability
Clear concise communication and a collaborative approach to problem-solving

Benefits

25 days holiday per holiday year plus public holidays
Pension scheme
An annual learning and development budget
Medical insurance including dental and optical cover
Life/critical illness cover
Social events including Christmas and Summer parties
Cycle to work scheme
Electric Vehicle Scheme
Sabbatical 4 years of service

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

Cyted Health

Cyted is on a mission to build a world where disease is prevented rather than treated. We focus on providing innovative diagnostic technologies to drive the earlier detection of disease in the gastrointestinal tract, first focused on oesophageal cancer (and its precursor Barrett’s oes ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click

AI Resume Builder

Create an ATS-ready CV in minutes

AI Cover Letter

Write a personalized letter instantly

Data Engineer

Cambridge - UK

Job Summary

About Us

Job Summary

Working Pattern and Location

What you will be doing

How we work

Person Specification

Benefits

About Us

Job Summary

Working Pattern and Location

What you will be doing

How we work

Person Specification

Benefits

Key Skills

About Company

Related Jobs