TITLE:Data Engineer
ORGANIZATION:Colorectal Cancer Alliance
LOCATION:Washington DC based Hybrid
POSITION TYPE:FullTime Exempt
REPORTS TO:Chief Date & Analytics Officer
COMPENSATION: $130000$150000 annual salary; Healthcare benefits are available for this role.
ORGANIZATION OVERVIEW:
The Colorectal Cancer Alliance is a national nonprofit organization committed to ending colorectal cancer within our lifetime. We help patients families survivors and caregivers navigate diagnosis and treatment options connect them with those who can share experiences and knowledge and identify resources to meet their needs. We partner with healthcare professionals and social influencers to raise awareness of preventative screening and we collaborate with researchers to better understand the disease and fund critical research. Our efforts are urgent effective and efficient because we believe that tomorrow cant wait.
POSITION OVERVIEW:
At the Colorectal Cancer Alliance we are building an innovative patientcentric datadriven precision oncology platform to transform the future of colorectal cancer awareness care and research. Our core systems BlueHQ BlueLake and KSPY are designed to empower patients caregivers healthcare providers and researchers through scalable interoperable and AIready architectures.
The Data Engineer will lead the design development and optimization of cloudbased data pipelines that fuel our realworld data and precision oncology platforms BlueHQ BlueLake and KSPY.
This role focuses on extracting transforming harmonizing and delivering highquality clinical patientreported and engagement data from sources such as REDCap Salesforce NPC AWS services and external health systems into our federated research and analytics environments.
The ideal candidate brings deep technical expertise in AWSnative data engineering a strong foundation in HIPAAcompliant data workflows and a passion for enabling longitudinal patient journeys clinical trial matching and realworld evidence generation through structured governed and scalable datasets.
You will collaborate closely with platform engineers navigators and research stakeholders to build the data foundation supporting advanced analytics AI/ML models patient navigation and care and future patientcentered discoveries.
POSITION RESPONSIBILITIES:
Key responsibilities include but are not limited to:
- Design build and optimize scalable data pipelines to extract transform and load (ETL/ELT) data from REDCap Salesforce NPC AWSbased platforms (HealthLake Redshift S3) and external partners into BlueLake and downstream analytics environments.
- Develop and maintain APIs connectors and integration workflows for seamless realtime or nearrealtime data movement across federated systems supporting zerocopy data federation (Athena Redshift Spectrum) and other diverse data flows.
- Model and harmonize clinical patientreported navigation trial and realworld data into semantically enriched datasets aligned to FHIR OMOP RDF and internal Single Canonical Form (SCF) standards.
- Collaborate closely with Data Managers Systems Engineers and Platform Architects to operationalize metadatadriven architectures eventdriven data contracts and interoperable patientcentric frameworks.
- Partner with metadata governance systems (DataHub or similar) to maintain data lineage provenance semantic clarity and compliance across the platform ecosystem.
- Support earlystage design storage and querying of linked knowledge graphs (e.g. RDF/SPARQL) connecting patients biomarkers navigation events trials and outcomes.
- Implement dbt models CI/CD pipelines and version control practices for structured auditable data transformations ready for analytics AI/ML models and predictive patient navigation.
- Build robust validation monitoring and data quality frameworks to ensure data integrity reliability and compliance with HIPAA GDPR 21 CFR Part 11 and IRBapproved protocols.
- Support longitudinal tracking of patient journeys including survivorship recurrence biomarker evolution navigation milestones and realworld outcomes across federated multisource environments.
- Develop scalable auditfriendly processes for ongoing realworld data (RWD) collection multisource harmonization semantic enrichment and analytical enablement.
- Maintain detailed technical documentation for pipelines transformations and governed data assets to ensure operational transparency reproducibility and audit readiness.
- Collaborate actively across navigation research analytics development and data governance teams to ensure the interoperability usability and strategic advancement of the organizations data infrastructure.
REQUIRED QUALIFICATIONS
- Minimum of 5 years of experience in data engineering ETL/ELT pipeline development and cloudbased data environments.
- AWS Certified Specialty certification required (must be active).
- Strong proficiency with AWSnative data services including AWS Glue Redshift (or Redshift Spectrum) S3 Lake Formation and Athena; familiarity with AWS HealthLake and SageMaker is preferred.
- Proven expertise in building scalable ETL/ELT pipelines using Python SQL and dbt (data build tool).
- Experience integrating structured and semistructured data sources (e.g. REDCap Salesforce CSVs FHIR JSON) into centralized or federated repositories.
- Solid understanding of data modeling (normalized star and snowflake schemas) semantic enrichment principles and zerocopy/federated architecture approaches.
- Handson knowledge of metadatadriven pipeline design data governance concepts (lineage cataloging privacy security) and regulatory compliance frameworks (HIPAA GDPR 21 CFR Part 11).
- Familiarity with healthcare and research interoperability standards such as FHIR OMOP HL7 or RDF/semantic web technologies.
- Strong experience monitoring and validating data quality across pipelines (completeness consistency accuracy) and data observability
- Comfort with APIfirst architectures including RESTful APIs and GraphQLbased data interactions.
- Experience with realtime or eventdriven data ingestion using tools such as Kafka Kinesis or AWS EventBridge.
- Excellent technical writing documentation and communication skills.
PREFERRED QUALIFICATIONS
- AWS Certified Cloud Practitioner and Solutions Architect (Associate or Professional) certification.
- Deep familiarity with REDCap database structures MySQL backends and APIdriven data extraction workflows.
- Experience supporting clinical trial data management patient registries or realworld evidence (RWE) studies.
- Handson experience implementing metadata cataloging platforms such as Atlan DataHub or Amundsen and designing metadatadriven ingestion frameworks.
- Knowledge of healthcare and clinical research data standards including CDISC OMOP FHIR and NCIT ontologies.
- Expertise working with deidentified and limited datasets under HIPAA Safe Harbor and Expert Determination methodologies.
- Familiarity with Salesforce Nonprofit Cloud and MuleSoftmediated system integrations.
- Experience designing and maintaining knowledge graphs or semantic integration platforms (e.g. AWS Neptune Stardog).
- Proficiency building federated patient cohorts across multisource environments to support clinical research and RWE generation.
- Exposure to eventdriven architectures (e.g. AWS EventBridge) and serverless data ingestion patterns.
- Experience preparing datasets for machine learning applications (e.g. SageMaker Feature Store pipelines).
- Strong proficiency with CI/CD practices for data workflows (e.g. dbt Cloud GitHub Actions).
SALARY RANGE:
Competitive nonprofit salary typically ranging from $ based on experience and qualifications.
STATEMENT OF NONDISCRIMINATION: The Colorectal Cancer Alliance does not discriminate on the basis of race color gender disability age religion sexual orientation nationality or ethnicity. We are strongly committed to hiring a diverse and multicultural staff and encourage applications from all backgrounds.
HOW TO APPLY:To apply please complete the application in our ADP Workforce Now application portal.
To see all employment opportunities at the Alliance please click here to be directed to our careers site.
If you encounter any issues with this application please contact us at