Remote but may require onsite visit once in 3 months
3 professional References
The scope of the proposed services will include the following:
Assess feasibility and technical requirements for LINKS DataLake integration.
Build and optimize ETL workflows for LINKS and complementary datasets (Vital Records labs registries).
Design scalable data workflows to improve data quality integrity and identity resolution.
Implement data governance observability and lineage tracking across all pipelines.
Mentor engineers support testing and enforce best practices in orchestration and architecture.
Document and communicate technical solutions to technical and non-technical stakeholders.
Expertise and/or relevant experience in the following areas are mandatory:
3 years of experience in data engineering and/or data architecture
2 years of experience with Python for ETL and automation (pandas requests API integration).
2 years hands-on experience with SQL queries stored procedures performance tuning (preferable Oracle SQL Server MySQL)
1 year experience with ETL orchestration tools (Prefect Airflow or equivalent).
1 year experience with cloud platforms (Azure AWS or GCP) including data onboarding/migration.
1 year exposure to data lake / medallion architecture (bronze silver gold)
2 years of experience providing written documentation and verbal communication for cross-functional collaboration.
Expertise and/or relevant experience in the following areas are desirable but not mandatory:
5 years of experience in data engineering roles
Experience integrating or developing REST/JSON or XML APIs
Familiarity with CI/CD pipelines (GitHub Actions Azure DevOps etc.).
Exposure to Infrastructure as Code experience (Terraform CloudFormation).
Experience with data governance and metadata tools (Atlan OpenMetadata Collibra).
Public health/healthcare dataset or similar experience including PHI/PII handling.
Familiarity with SAS and R workflows to support epidemiologists and analysts.
Experience with additional SQL platforms (Postgres Snowflake Redshift BigQuery).
Familiarity with data quality frameworks (Great Expectations Deequ).
Experience with real-time/streaming tools (Kafka Spark Streaming).
Familiarity with big data frameworks for large-scale transformations (Spark Hadoop).
Knowledge of data security and compliance frameworks (HIPAA SOC 2 etc.).
Agile/SCRUM team experience.
Remote but may require onsite visit once in 3 months 3 professional References The scope of the proposed services will include the following: Assess feasibility and technical requirements for LINKS DataLake integration. Build and optimize ETL workflows for LINKS and complementary data...
Remote but may require onsite visit once in 3 months
3 professional References
The scope of the proposed services will include the following:
Assess feasibility and technical requirements for LINKS DataLake integration.
Build and optimize ETL workflows for LINKS and complementary datasets (Vital Records labs registries).
Design scalable data workflows to improve data quality integrity and identity resolution.
Implement data governance observability and lineage tracking across all pipelines.
Mentor engineers support testing and enforce best practices in orchestration and architecture.
Document and communicate technical solutions to technical and non-technical stakeholders.
Expertise and/or relevant experience in the following areas are mandatory:
3 years of experience in data engineering and/or data architecture
2 years of experience with Python for ETL and automation (pandas requests API integration).
2 years hands-on experience with SQL queries stored procedures performance tuning (preferable Oracle SQL Server MySQL)
1 year experience with ETL orchestration tools (Prefect Airflow or equivalent).
1 year experience with cloud platforms (Azure AWS or GCP) including data onboarding/migration.
1 year exposure to data lake / medallion architecture (bronze silver gold)
2 years of experience providing written documentation and verbal communication for cross-functional collaboration.
Expertise and/or relevant experience in the following areas are desirable but not mandatory:
5 years of experience in data engineering roles
Experience integrating or developing REST/JSON or XML APIs
Familiarity with CI/CD pipelines (GitHub Actions Azure DevOps etc.).
Exposure to Infrastructure as Code experience (Terraform CloudFormation).
Experience with data governance and metadata tools (Atlan OpenMetadata Collibra).
Public health/healthcare dataset or similar experience including PHI/PII handling.
Familiarity with SAS and R workflows to support epidemiologists and analysts.
Experience with additional SQL platforms (Postgres Snowflake Redshift BigQuery).
Familiarity with data quality frameworks (Great Expectations Deequ).
Experience with real-time/streaming tools (Kafka Spark Streaming).
Familiarity with big data frameworks for large-scale transformations (Spark Hadoop).
Knowledge of data security and compliance frameworks (HIPAA SOC 2 etc.).
Agile/SCRUM team experience.
View more
View less