Data Engineer / Architect
Parsippany NJ (Onsite Hybrid 3 days a week) / Remote with 30% Travel
Fulltime
Job Description:
- Design and develop scalable data pipelines using dbt Cloud Databricks and Apache Airflow to support enterprise analytics and reporting
- Build and optimize Delta Lake-based data models to enable analytics-ready datasets
- Implement advanced data modeling techniques including star schema fact/dimension design and SCD Type 1 & Type 2
- Develop modular reusable and testable SQL-based transformations using DBT models macros and packages
- Design and manage incremental data loading strategies ensuring efficient processing of large-scale datasets
- Leverage Databricks SQL Spark and Delta Lake capabilities for high-performance data processing and optimization
- Implement robust data quality checks and testing frameworks using DBT tests (e.g. not null unique referential integrity)
- Collaborate with cross-functional teams including data engineers data scientists and BI teams to deliver business-driven data solutions
- Integrate DBT pipelines with CI/CD workflows using Git-based version control and orchestrate jobs via Databricks Workflows or external schedulers
- Ensure adherence to data governance security and compliance standards leveraging tools like Unity Catalog and enterprise policies.
- Orchestrate end-to-end workflows using Airflow DAGs ensuring dependency management scheduling retries and fault tolerance
Technical Expertise
- AWS & Cloud Architecture: Expert-level experience with AWS services (S3 RDS Bedrock agents) PostgreSQL and cloud-based data governance
- Advanced Analytics: Regression analysis time-series forecasting multivariate analysis and classification models
- MLOps & Deployment: Design and maintain model deployment monitoring and automated retraining pipelines
- Simulation & Forecasting: Agent-based simulation for trial enrollment forecasting and scenario planning
Data & Analytics Capabilities
- Feature Engineering: Extract insights from site performance historical enrollment and competitive landscape data
- Model Evaluation: Build evaluation frameworks (AUC precision/recall) and optimize model granularity across disease/geography
- Enterprise Data Integration: Merge internal (CTMS performance data) and external sources (Citeline epidemiological data)
- Master Data Management: Create Golden ID datasets with data quality monitoring and continuous refresh capabilities
Experience Required
- 5 years in pharmaceutical/clinical trial analytics
- Focus on site selection and non-enrollment prediction
- Proven track record with clinical operations data systems
Data Engineer / Architect Parsippany NJ (Onsite Hybrid 3 days a week) / Remote with 30% Travel Fulltime Job Description: Design and develop scalable data pipelines using dbt Cloud Databricks and Apache Airflow to support enterprise analytics and reporting Build and optimize Delta Lake-based ...
Data Engineer / Architect
Parsippany NJ (Onsite Hybrid 3 days a week) / Remote with 30% Travel
Fulltime
Job Description:
- Design and develop scalable data pipelines using dbt Cloud Databricks and Apache Airflow to support enterprise analytics and reporting
- Build and optimize Delta Lake-based data models to enable analytics-ready datasets
- Implement advanced data modeling techniques including star schema fact/dimension design and SCD Type 1 & Type 2
- Develop modular reusable and testable SQL-based transformations using DBT models macros and packages
- Design and manage incremental data loading strategies ensuring efficient processing of large-scale datasets
- Leverage Databricks SQL Spark and Delta Lake capabilities for high-performance data processing and optimization
- Implement robust data quality checks and testing frameworks using DBT tests (e.g. not null unique referential integrity)
- Collaborate with cross-functional teams including data engineers data scientists and BI teams to deliver business-driven data solutions
- Integrate DBT pipelines with CI/CD workflows using Git-based version control and orchestrate jobs via Databricks Workflows or external schedulers
- Ensure adherence to data governance security and compliance standards leveraging tools like Unity Catalog and enterprise policies.
- Orchestrate end-to-end workflows using Airflow DAGs ensuring dependency management scheduling retries and fault tolerance
Technical Expertise
- AWS & Cloud Architecture: Expert-level experience with AWS services (S3 RDS Bedrock agents) PostgreSQL and cloud-based data governance
- Advanced Analytics: Regression analysis time-series forecasting multivariate analysis and classification models
- MLOps & Deployment: Design and maintain model deployment monitoring and automated retraining pipelines
- Simulation & Forecasting: Agent-based simulation for trial enrollment forecasting and scenario planning
Data & Analytics Capabilities
- Feature Engineering: Extract insights from site performance historical enrollment and competitive landscape data
- Model Evaluation: Build evaluation frameworks (AUC precision/recall) and optimize model granularity across disease/geography
- Enterprise Data Integration: Merge internal (CTMS performance data) and external sources (Citeline epidemiological data)
- Master Data Management: Create Golden ID datasets with data quality monitoring and continuous refresh capabilities
Experience Required
- 5 years in pharmaceutical/clinical trial analytics
- Focus on site selection and non-enrollment prediction
- Proven track record with clinical operations data systems
View more
View less