Job Title: Sr Data Engineer with Imaging
Experience: - 10 years experience
Location(Onsite): - San Francisco (4 days on prem at client location)
Key Responsibilities:
- Imaging Data Pipeline delivery : Design implement and maintain automated pipelines for onboarding verifying transforming and curating biomedical imaging data from clinical trials and real-world data sources for therapeutic areas Oncology Neurology Ophthalmology covering all image file formats.
- Data Quality and Integrity : Develop and implement solutions to detect and correct anomalies and inconsistencies to achieve highest data quality of the imaging data set per industry standards DICOM and internal specifications like FFS RTS GDSR etc. Ensure de-identification PHI/PII controls and image specific QC checks are implemented at scale.
- Data Analysis and integration : Integrate ML and AI assisted tools in the pipelines for inline image analysis classifications segmentations to extract and enrich metadata for various analyses optimize performance etc.
- Image Data Management : Build and maintain large-scale catalogs of curated imaging data sets enhancing FAIR principles and providing easy-to-discover and access imaging data assets.
- Compliance and Controls : Ensure applicable compliance and privacy controls are followed as required GXP CSV validations.
- Collaboration: Working closely with Image scientists data scientists clinops and biomarker research teams supporting data needs for various primary and secondary endpoint analyses.
- External collaboration: Work with external partners e.g. CROs in ensuring the imaging data received conforms to the established agreements and quality standards and complete.
- Lead to the delivery team to ensure timely delivery of product backlog / features
- Participate with the team and lead various agile ceremonies throughout the planning and execution.
Skills:-
- Worked with medical imaging data and platforms PACS VNAs etc.
- Worked with imaging Radiology data e.g. CT PET MRI nifti and Ophtha imaging OCT FACFP etc.
- Good understanding of DICOM standard structure metadata parsing tags and multi-frame images
- Worked with Clinical information data standards like SDTM ADaM
- Data integration across diverse data sources e.g. imaging data with tabular clinical data
- De-identification methodologies PHI/PII detectionn and privacy controls
- Good understanding of GXP and CSV validation frameworks
- Proficient in Python its libraries: pandas pydicom SimpleITK dicom-numpy dcm2niix
- Hands-on experience with ETL/ELT involving large medical imaging datasets
- Apache Airflow Spark Talend or similar orchestration of complex workflows.
- . Proficiency with SQL and NoSQL and image metadata stores. (PostgreSQL MongoDB Etc)
- Practical experience with AWS infrastructure Data services e.g. RDS Athena Glue EC2 Lambda S3 .
- Familiar with EKS Docker and HPC
- Experience in data analysis and report generation e.g Tibco Tableau AWS Quicksite etc
- Good knowledge of Git Gitlab and DevOps tools like Jenkins Terraform
- Familiar with using ML workflows for Computer Vision tasks like segmentation classification etc..
- Nice to have: implemented solutions on NLP and GenAI
- Worked with cross functional global teams in a dynamic Agile environment
- Lead and mentor agile team members.
- Has 10 years of experience with data platforms analysis and insights
Educational Qualifications: -
- Engineering Degree BE/ME/BTech/MTech/BSc/MSc.
- Technical certification in multiple technologies is desirable.