Data Engineer with Databricks and Spark

Sumeru Solutions

Job Location:

Bellevue, WA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

JOB SUMMARY

This role builds and maintains scalable data pipelines and lakehouse infrastructure on Microsoft Azure to support efficient extraction transformation and loading of data across batch and real-time workloads. It involves implementing and managing the Medallion Architecture (Bronze Silver Gold) using Azure Data Factory Databricks-PySpark and Azure SQL Database and Databricks Unity Catalogue.
The role requires ensuring SLA-adherent data quality standards. Success is measured by pipeline reliability data freshness SLA compliance and the quality of Gold-layer datasets powering Power BI executive dashboards.
The work supports organizational decision-making by delivering trusted well-governed data to business executives and analytics consumers.

Required Skills:

Experience building and optimizing big data pipelines using Azure Data Factory PySpark and SQL across structured and semi-structured data sets
Hands-on experience implementing Medallion Architecture (Bronze/Silver/Gold)
Experience with Delta Lake - ACID transactions incremental loading schema evolution partitioning strategies
Experience performing root cause analysis on pipeline failures and data quality issues to resolve SLA breaches and identify platform improvement opportunities

Azure Foundational Services :

Working knowledge of: Azure Data Factory (ADF) ADLS Gen2 Azure SQL Database Azure Blob Storage Azure Key Vault Azure Monitor / Log Analytics Azure Event Hubs Microsoft Fabric Lakehouse Azure Active Directory / Entra ID (RBAC Service Principals)

Programming Languages:

Proficiency in Python and PySpark for data transformation pipeline automation and large-scale distributed processing; strong SQL skills including window functions CTEs and query optimization across relational and lakehouse engines

Data Architecture:

Solid understanding of Medallion Architecture dimensional modeling (Star Schema SCD Types 1/2/3) and the trade-offs between lakehouse data warehouse and data lake patterns

Pipeline Engineering:

Ability to build robust ADF pipelines with ForEach Lookup Copy Activity and Data Flows; incremental loading via watermark or CDC; error handling retry logic and dead-letter patterns

Data Quality Experience:

Experience implementing SLA-based data quality checks (freshness completeness row count) monitoring via Azure Monitor and ADF diagnostic logs and defining data quality agreements with business stakeholders.

DevOps for Data:

Experience with Git-based workflows ADF Git integration CI/CD pipeline promotion across Dev/Test/Prod using Azure DevOps or GitHub Actions

Reporting Layer Awareness:

Understanding of how Gold-layer data feeds Power BI - DirectQuery vs. Import mode trade-offs dataset refresh patterns and semantic model collaboration with BI teams
Ability to manage work across multiple concurrent pipeline projects prioritize by business impact and communicate status clearly to technical and non-technical stakeholders

Good to have skills:

Experience with Microsoft Fabric (Lakehouse Notebooks OneLake Fabric Pipelines) - active migration or greenfield project
Experience with real-time / streaming workloads using Azure Event Hubs or Structured Streaming in PySpark
Experience delivering data platforms for executive-level reporting via Power BI semantic models

JOB SUMMARY This role builds and maintains scalable data pipelines and lakehouse infrastructure on Microsoft Azure to support efficient extraction transformation and loading of data across batch and real-time workloads. It involves implementing and managing the Medallion Architecture (Bronze Si...