Sr. Databricks Engineer- 20823
We are looking for an experienced and technically strong Senior Databricks Developer with 5 8 years of experience in PySpark and Databricks. This role involves leading the design and development of scalable data pipelines reusable accelerator components and performance-optimized Spark workloads. The candidate will work closely with architects and platform teams and take ownership of delivering production-grade data solutions.
Key Responsibilities:
Lead the design and development of scalable reusable data pipelines and accelerator frameworks using PySpark and Databricks.
Design and execute testing strategies for both structured and unstructured data (PDFs Text files and RTF documents) to ensure high-fidelity transformation into structured formats within the Data Lakehouse.
Validate all Source-to-Target (S2T) mappings across bronze silver and gold layers to ensure data lineage and integrity.
Collaborate with architects and stakeholders to translate business and technical requirements into robust data solutions.
Own end-to-end development testing and deployment of data pipelines across multiple environments.
Drive Spark performance optimization cost efficiency and best practices across Databricks workloads.
Design and manage workflow orchestration using Databricks Workflows and Apache Airflow.
Leverage strong PYSpark/ Spark SQL knowledge to create mockup data for exhaustive edge-case testing.
Review code mentor junior developers and enforce engineering best practices.
Monitor troubleshoot and resolve complex development issues to ensure SLA and data reliability.
Contribute to CI/CD pipeline design and automation for data engineering workflows.
Required Skills & Experience:
5 8 years of experience in data engineering with strong expertise in PySpark and distributed data processing.
Extensive hands-on experience with Databricks (Notebooks Jobs Workflows Delta Lake).
Strong command of Spark SQL Advanced SQL performance tuning and large-scale data transformations.
Proven experience designing modular reusable and testable data frameworks or accelerators.
Hands-on experience with workflow orchestration using Databricks Workflows.
Solid understanding of CI/CD practices and Git-based source control.
Working knowledge of AWS cloud platform.
Strong communication skills with experience collaborating across teams.
Preferred Qualifications:
Experience building enterprise-grade internal accelerators or data platforms.
Working knowledge of Unity Catalog for data governance access control and lineage.
Databricks Hands-on experience with PDM (Patient Data Model) and OMOP (Observational Medical Outcomes Partnership) common data models.
Experience with K-anonymity testing and data de-identification validation protocols.
Exposure to Delta Live Tables or declarative pipeline development patterns.
Exposure to Data Quality Expectations
Experience with monitoring logging and observability of Spark and Databricks workloads.
Databricks certification (Associate or Professional) is a plus.
Sr. Databricks Engineer- 20823 We are looking for an experienced and technically strong Senior Databricks Developer with 5 8 years of experience in PySpark and Databricks. This role involves leading the design and development of scalable data pipelines reusable accelerator components and perfor...
Sr. Databricks Engineer- 20823
We are looking for an experienced and technically strong Senior Databricks Developer with 5 8 years of experience in PySpark and Databricks. This role involves leading the design and development of scalable data pipelines reusable accelerator components and performance-optimized Spark workloads. The candidate will work closely with architects and platform teams and take ownership of delivering production-grade data solutions.
Key Responsibilities:
Lead the design and development of scalable reusable data pipelines and accelerator frameworks using PySpark and Databricks.
Design and execute testing strategies for both structured and unstructured data (PDFs Text files and RTF documents) to ensure high-fidelity transformation into structured formats within the Data Lakehouse.
Validate all Source-to-Target (S2T) mappings across bronze silver and gold layers to ensure data lineage and integrity.
Collaborate with architects and stakeholders to translate business and technical requirements into robust data solutions.
Own end-to-end development testing and deployment of data pipelines across multiple environments.
Drive Spark performance optimization cost efficiency and best practices across Databricks workloads.
Design and manage workflow orchestration using Databricks Workflows and Apache Airflow.
Leverage strong PYSpark/ Spark SQL knowledge to create mockup data for exhaustive edge-case testing.
Review code mentor junior developers and enforce engineering best practices.
Monitor troubleshoot and resolve complex development issues to ensure SLA and data reliability.
Contribute to CI/CD pipeline design and automation for data engineering workflows.
Required Skills & Experience:
5 8 years of experience in data engineering with strong expertise in PySpark and distributed data processing.
Extensive hands-on experience with Databricks (Notebooks Jobs Workflows Delta Lake).
Strong command of Spark SQL Advanced SQL performance tuning and large-scale data transformations.
Proven experience designing modular reusable and testable data frameworks or accelerators.
Hands-on experience with workflow orchestration using Databricks Workflows.
Solid understanding of CI/CD practices and Git-based source control.
Working knowledge of AWS cloud platform.
Strong communication skills with experience collaborating across teams.
Preferred Qualifications:
Experience building enterprise-grade internal accelerators or data platforms.
Working knowledge of Unity Catalog for data governance access control and lineage.
Databricks Hands-on experience with PDM (Patient Data Model) and OMOP (Observational Medical Outcomes Partnership) common data models.
Experience with K-anonymity testing and data de-identification validation protocols.
Exposure to Delta Live Tables or declarative pipeline development patterns.
Exposure to Data Quality Expectations
Experience with monitoring logging and observability of Spark and Databricks workloads.
Databricks certification (Associate or Professional) is a plus.
View more
View less