We are seeking a highly skilled Data QA Lead with 8 10 years of experience in Big Data and Spark technologies to drive the quality assurance strategy for the Healthcare project. The ideal candidate will have 3 4 years of hands-on experience with Databricks and a strong product-building mindset. You will lead the validation of scalable data pipelines ensuring data integrity across Delta Lake and Unity Catalog environments
Key Responsibilities:
Lead Utilize Databricks Notebooks to author comprehensive technical test cases and validation logic using PySpark and Spark SQL.
Design and execute testing strategies for both structured and unstructured data (PDFs Text files and RTF documents) to ensure high-fidelity transformation into structured formats within the Data Lakehouse.
Validate all Source-to-Target (S2T) mappings across bronze silver and gold layers to ensure data lineage and integrity.
Leverage strong PYSpark/ Spark SQL knowledge to create mockup data for exhaustive edge-case testing.
Hands-on experience with workflow orchestration Databricks Workflows
Design implement and maintain an automated regression suite to ensure pipeline stability across code releases
Validate end-to-end data accuracy from the Lakehouse layers to final analytical dashboards and reporting tools.
Lead the review of test cases and drive development best practices including code reviews and performance optimization for test scripts.
Implement and verify data quality checks within the Unity Catalog to ensure proper data lineage and access control
Partner with data engineers and analysts to align quality benchmarks with Healthix business goals.
Required Skills & Experience:
8 10 years of experience in Big Data Engineering using Apache Spark and related technologies.
Extensive hands-on experience with Databricks (Notebooks Jobs Workflows Delta Lake).
Minimum 2 3 years of hands-on experience with Databricks including:
Delta Lake for scalable and reliable data lake architectures.
Unity Catalog for centralized data governance.
Job & Workflow orchestration including DLT pipelines.
Experience validating DLT and modern ETL/ELT design patterns.
Hands-on experience with workflow orchestration using Databricks Workflows
Proficient in PySpark SparkSQL Advanced SQL and Spark optimization techniques.
Experience with AWS cloud platform.
Excellent communication stakeholder management and leadership skills.
Preferred Qualifications:
Databricks Hands-on experience with PDM (Patient Data Model) and OMOP (Observational Medical Outcomes Partnership) common data models.
Experience with K-anonymity testing and data de-identification validation protocols.
Databricks Certified Data Engineering Associate or Professional is a plus.
Experience contributing to internal quality accelerators or testing utilities
We are seeking a highly skilled Data QA Lead with 8 10 years of experience in Big Data and Spark technologies to drive the quality assurance strategy for the Healthcare project. The ideal candidate will have 3 4 years of hands-on experience with Databricks and a strong product-building mindset. ...
We are seeking a highly skilled Data QA Lead with 8 10 years of experience in Big Data and Spark technologies to drive the quality assurance strategy for the Healthcare project. The ideal candidate will have 3 4 years of hands-on experience with Databricks and a strong product-building mindset. You will lead the validation of scalable data pipelines ensuring data integrity across Delta Lake and Unity Catalog environments
Key Responsibilities:
Lead Utilize Databricks Notebooks to author comprehensive technical test cases and validation logic using PySpark and Spark SQL.
Design and execute testing strategies for both structured and unstructured data (PDFs Text files and RTF documents) to ensure high-fidelity transformation into structured formats within the Data Lakehouse.
Validate all Source-to-Target (S2T) mappings across bronze silver and gold layers to ensure data lineage and integrity.
Leverage strong PYSpark/ Spark SQL knowledge to create mockup data for exhaustive edge-case testing.
Hands-on experience with workflow orchestration Databricks Workflows
Design implement and maintain an automated regression suite to ensure pipeline stability across code releases
Validate end-to-end data accuracy from the Lakehouse layers to final analytical dashboards and reporting tools.
Lead the review of test cases and drive development best practices including code reviews and performance optimization for test scripts.
Implement and verify data quality checks within the Unity Catalog to ensure proper data lineage and access control
Partner with data engineers and analysts to align quality benchmarks with Healthix business goals.
Required Skills & Experience:
8 10 years of experience in Big Data Engineering using Apache Spark and related technologies.
Extensive hands-on experience with Databricks (Notebooks Jobs Workflows Delta Lake).
Minimum 2 3 years of hands-on experience with Databricks including:
Delta Lake for scalable and reliable data lake architectures.
Unity Catalog for centralized data governance.
Job & Workflow orchestration including DLT pipelines.
Experience validating DLT and modern ETL/ELT design patterns.
Hands-on experience with workflow orchestration using Databricks Workflows
Proficient in PySpark SparkSQL Advanced SQL and Spark optimization techniques.
Experience with AWS cloud platform.
Excellent communication stakeholder management and leadership skills.
Preferred Qualifications:
Databricks Hands-on experience with PDM (Patient Data Model) and OMOP (Observational Medical Outcomes Partnership) common data models.
Experience with K-anonymity testing and data de-identification validation protocols.
Databricks Certified Data Engineering Associate or Professional is a plus.
Experience contributing to internal quality accelerators or testing utilities
View more
View less