DescriptionWere hiring a Senior Data Engineer (Genetics) to join us on 12 months FTC to support Maternity Cover. This role will be working within our Bioinformatics Team working on genetic data and building pipelines to process control and create data releasees for Researchers.
Our Future Health is the UKs largest ever health research programme bringing people together to develop new ways to detect prevent and treat diseases. We are a charity supported by the UK Government in partnership with charities and industry. We work closely with the NHS and with public authorities across all nations and regions of the UK.
Our plan is to bring together 5 million volunteers from right across the UK who will be asked to contribute information to help build one of the most detailed pictures we have ever had of peoples health. Researchers will be able to use this information to make new discoveries about human health and diseases. So future generations can live in good health for longer.
What Youll Be Doing:
- Support the build of production-level data pipelines from data providers to our primary data store and Trusted Research Environment. Work closely with the Lead Data Engineer on key designs and features.
- Build and maintain pipelines which meets the requirements for our end users and builds well curated accessible and quality controlled data for analysis.
- Keep abreast of best practice in data engineering across industry research and Government and facilitating the adoption of standards. Work to promote the adoption of best practises across the squad (unit testing CI/CD).
- Work with our Science team and Product to understand the data requirements and work with them to deliver the data needed for their projects.
RequirementsTo be successful in this role you will need to have experience of some of the following:
- Experience working in an agile development team.
- Comfortable building and maintaining robust scalable and efficient data pipelines that run in the cloud. Capable of processing very large amounts of data being received daily based on feeds from multiple systems using a range of different technologies.
- Can listen to the needs of technical and business stakeholders and interpret them and effectively manage stakeholder expectations. Can write ODPs/RFCs equivalent and drive discussions within the squad and help the Lead Data Engineer supervise/drive specific initiatives of work.
- Strong experience working with genetic data (ideally genotype and imputation data). Detailed understanding of common bioinformatics file formats (VCF BAM/CRAM GTC FastQ etc) and accompanying tools (bcftools PLINK QCtools etc)
- Experience in validating and QCing complex genomic datasets.
- Highly proficient in Python with solid command line knowledge and Unix skills.
- Highly proficient working with cloud environments (ideally Azure) distributed computing and optimising workflows and pipelines.
- Experience working with common data transformation and storage formats e.g. Apache Parquet Delta tables.
- Strong experience working with containerisation (e.g. Docker) and deployment (e.g. Kubernetes).
- Experience with Spark Databricks data lakes.
- Highly proficient in working with version control and Git/GitHub.
Join us - letsprevent disease together.
We advise not delaying your application as this advert may close early if a high number of applications are received.
At Our Future Health we recognise the importance of having a diverse workforce and ensuring that all candidates regardless of their background have equitable access to our application process. We proactively encourage applicants who identify as having a disability neurodiversity or long-term health conditions to let us know if they require any reasonable adjustments as part of their application process.
If you do require any reasonable adjustments please email us at
Required Experience:
Senior IC