DescriptionOur Future Health will be the UKs largest ever health research programme bringing people together to develop new ways to detect prevent and treat diseases. Diseases like cancer dementia diabetes and heart disease affect the lives of many people in our communities. Our goal is to create a world-leading resource for health research to improve our understanding and spot the patterns of how and why common diseases start so treatments can begin sooner and be more effective.
Our plan is to bring together 5 million volunteers from right across the UK who will be asked to contribute information to help build one of the most detailed pictures we have ever had of peoples health. Researchers will be able to use this information to make new discoveries about human health and diseases. So future generations can live in good health for longer.
We are looking for a Senior Data Engineer to bring an in-depth knowledge ofhealthand surveydata and data solutions to help solve some of the key challenges around a programme of work at industrial scale with global significance.
- You will write and contribute code to a complex code base responsible for delivering data to researchers on an industrial scale.
- You will work with health records and build pipelines and systems to process it controlqualityand create data releases for researchers.
Our Senior Data Engineer will have experience contributing to complex code bases shared with multiple engineers. You know how to create repeatable and reusable know how to communicate to and between technical and non-technical stakeholders as well asfacilitatediscussions within a multidisciplinary team including scientists software engineers productmanagersand other data engineers.
Essential Duties and Responsibilities:
- Supportthe build of data pipelines from data providers to our primary data store and trusted research the design scoping and build of dataflows.
- Produce logic for data transformation steps as code which meets the requirements for our end users and builds well curatedaccessibleand quality controlled data for analysis.
- Contribute to code base for multiple data pipelineswhile ensuring best coding practises are used.
- Work with data scientists and epidemiologists to understand the data requirements and work with them to deliver the data needed for their projects.
- Keep abreast of best practice in data engineering across industry research and Government andfacilitatingthe adoption of standards.
Requirements- Experience building andmaintainingrobustscalableand efficient data pipelines. Capable of processingvery largeamounts of data based on feeds from multiple systems using a range of different technologies.
- Can listen to the needs of technical and business stakeholders and interpretthem andeffectively manage stakeholder expectations.
- Experience working with health data (ideallyNHSor survey/questionnairedata). Experience working with well-known secondary care datasets (Hospital Episodes Statistics Death registry data A&E data etc) as well as Primary care (GP data) would working with survey/questionnaire data and standards (REDCap) would beadvantageous.
- Highly proficient in Python with solid command line knowledge and Unix skills.
- Good understanding of cloud environments (ideally Azure) distributed computing and optimising workflows and pipelines.
- Understanding of common data transformation and storage formats e.g. Apache Parquet Delta tables.
- Understanding of containerisation (e.g. Docker) and deployment (e.g. Kubernetes).
- Experience with Spark Databricks data lakes.
- Follow best practices like code review cleancodeand unit tests.
- Experience working in an agile development team. Familiar with version control and Git/GitHub.
DescriptionOur Future Health will be the UKs largest ever health research programme bringing people together to develop new ways to detect prevent and treat diseases. Diseases like cancer dementia diabetes and heart disease affect the lives of many people in our communities. Our goal is to create a ...
DescriptionOur Future Health will be the UKs largest ever health research programme bringing people together to develop new ways to detect prevent and treat diseases. Diseases like cancer dementia diabetes and heart disease affect the lives of many people in our communities. Our goal is to create a world-leading resource for health research to improve our understanding and spot the patterns of how and why common diseases start so treatments can begin sooner and be more effective.
Our plan is to bring together 5 million volunteers from right across the UK who will be asked to contribute information to help build one of the most detailed pictures we have ever had of peoples health. Researchers will be able to use this information to make new discoveries about human health and diseases. So future generations can live in good health for longer.
We are looking for a Senior Data Engineer to bring an in-depth knowledge ofhealthand surveydata and data solutions to help solve some of the key challenges around a programme of work at industrial scale with global significance.
- You will write and contribute code to a complex code base responsible for delivering data to researchers on an industrial scale.
- You will work with health records and build pipelines and systems to process it controlqualityand create data releases for researchers.
Our Senior Data Engineer will have experience contributing to complex code bases shared with multiple engineers. You know how to create repeatable and reusable know how to communicate to and between technical and non-technical stakeholders as well asfacilitatediscussions within a multidisciplinary team including scientists software engineers productmanagersand other data engineers.
Essential Duties and Responsibilities:
- Supportthe build of data pipelines from data providers to our primary data store and trusted research the design scoping and build of dataflows.
- Produce logic for data transformation steps as code which meets the requirements for our end users and builds well curatedaccessibleand quality controlled data for analysis.
- Contribute to code base for multiple data pipelineswhile ensuring best coding practises are used.
- Work with data scientists and epidemiologists to understand the data requirements and work with them to deliver the data needed for their projects.
- Keep abreast of best practice in data engineering across industry research and Government andfacilitatingthe adoption of standards.
Requirements- Experience building andmaintainingrobustscalableand efficient data pipelines. Capable of processingvery largeamounts of data based on feeds from multiple systems using a range of different technologies.
- Can listen to the needs of technical and business stakeholders and interpretthem andeffectively manage stakeholder expectations.
- Experience working with health data (ideallyNHSor survey/questionnairedata). Experience working with well-known secondary care datasets (Hospital Episodes Statistics Death registry data A&E data etc) as well as Primary care (GP data) would working with survey/questionnaire data and standards (REDCap) would beadvantageous.
- Highly proficient in Python with solid command line knowledge and Unix skills.
- Good understanding of cloud environments (ideally Azure) distributed computing and optimising workflows and pipelines.
- Understanding of common data transformation and storage formats e.g. Apache Parquet Delta tables.
- Understanding of containerisation (e.g. Docker) and deployment (e.g. Kubernetes).
- Experience with Spark Databricks data lakes.
- Follow best practices like code review cleancodeand unit tests.
- Experience working in an agile development team. Familiar with version control and Git/GitHub.
View more
View less