Job Summary
The Data Scientist design build and operate robust secure data pipelines that power clinical-research products analytics dashboards and downstream data-science workloads. Partner closely with clinicians investigators Office of Information Technology and collaborators Business Intelligence teams and external research collaborators to translate complex biomedical data into actionable insights.
Essential Duties And Responsibilities
Architect end-to-end pipelines that ingest high-volume de-identified clinical genomic and phenotypic datasets from collaborators EHR systems (Epic Clarity/Caboodle) and cloud storage. Build and host production-grade web portals and REST APIs for secure researcher/clinician access supporting role-based permissions and audit trails. Leverage OpenAI LLMs (or similar NLP services) to auto-extract Human Phenotype Ontology ( HPO ) terms from de-identified clinical documentation. Design high-throughput ETL workflows that parse heterogeneous datasets for ingestion into relational databases and cloud-native warehouses feeding results into downstream analytics pipelines. Design and develop real-time capable analytical systems to integrate with and/or augment EHR systems. Perform systems administration for data-platform hosts including system hardening patch management firewall configuration. Implement monitoring stacks and custom health checks to maintain near-continuous system availability. Translate clinical research requirements into technical specifications producing clear data-model diagrams lineage documentation and data-dictionary artifacts. Deliver data-product demos to investigators effectively showcasing how pipeline outputs support precision medicine reporting. Champion standards for metadata management schema versioning and test-driven data engineering. Other duties as assigned.
Minimum Qualifications
Bachelors degree in Computer Science Engineering or related field (or equivalent experience). Seven (7) years of professional experience in data engineering software development or an equivalent mix of education and relevant experience in similar role.
Preferred Qualifications
Experience with Snowflake Microsoft Azure Synapse or other modern data- warehouse platforms. Exposure to machine-learning pipelines (e.g. using OpenAI or other LLM services). Experience building/maintaining cloud data platforms (such as GCP OCI Linode AWS Azure) and data-lake/warehouse solutions as well as production workload management. Hands-on Linux system administration (containerization networking security).
Work Schedule
Monday-Friday; 8:00am-5:00pm Based at UTA (Arlington TX) with regular on-site collaboration with Cook Childrens teams in Fort Worth.
Required Experience:
IC
Job SummaryThe Data Scientist design build and operate robust secure data pipelines that power clinical-research products analytics dashboards and downstream data-science workloads. Partner closely with clinicians investigators Office of Information Technology and collaborators Business Intelligence...
Job Summary
The Data Scientist design build and operate robust secure data pipelines that power clinical-research products analytics dashboards and downstream data-science workloads. Partner closely with clinicians investigators Office of Information Technology and collaborators Business Intelligence teams and external research collaborators to translate complex biomedical data into actionable insights.
Essential Duties And Responsibilities
Architect end-to-end pipelines that ingest high-volume de-identified clinical genomic and phenotypic datasets from collaborators EHR systems (Epic Clarity/Caboodle) and cloud storage. Build and host production-grade web portals and REST APIs for secure researcher/clinician access supporting role-based permissions and audit trails. Leverage OpenAI LLMs (or similar NLP services) to auto-extract Human Phenotype Ontology ( HPO ) terms from de-identified clinical documentation. Design high-throughput ETL workflows that parse heterogeneous datasets for ingestion into relational databases and cloud-native warehouses feeding results into downstream analytics pipelines. Design and develop real-time capable analytical systems to integrate with and/or augment EHR systems. Perform systems administration for data-platform hosts including system hardening patch management firewall configuration. Implement monitoring stacks and custom health checks to maintain near-continuous system availability. Translate clinical research requirements into technical specifications producing clear data-model diagrams lineage documentation and data-dictionary artifacts. Deliver data-product demos to investigators effectively showcasing how pipeline outputs support precision medicine reporting. Champion standards for metadata management schema versioning and test-driven data engineering. Other duties as assigned.
Minimum Qualifications
Bachelors degree in Computer Science Engineering or related field (or equivalent experience). Seven (7) years of professional experience in data engineering software development or an equivalent mix of education and relevant experience in similar role.
Preferred Qualifications
Experience with Snowflake Microsoft Azure Synapse or other modern data- warehouse platforms. Exposure to machine-learning pipelines (e.g. using OpenAI or other LLM services). Experience building/maintaining cloud data platforms (such as GCP OCI Linode AWS Azure) and data-lake/warehouse solutions as well as production workload management. Hands-on Linux system administration (containerization networking security).
Work Schedule
Monday-Friday; 8:00am-5:00pm Based at UTA (Arlington TX) with regular on-site collaboration with Cook Childrens teams in Fort Worth.
Required Experience:
IC
View more
View less