Data Engineer with Databricks and Spark
Bellevue, WA - USA
Job Summary
Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position: Data Engineer with Databricks and Spark
Location: Bellevue WA
Duration: 4 Months
Job Type: Temporary Assignment
Work Type: Onsite/Hybrid
Job Description
- This role builds and maintains scalable data pipelines and lakehouse infrastructure on Microsoft Azure to support efficient extraction transformation and loading of data across batch and real-time workloads.
- It involves implementing and managing the Medallion Architecture (Bronze Silver Gold) using Azure Data Factory Databricks-PySpark and Azure SQL Database and Databricks unity catalogue.
- The role requires ensuring SLA-adherent data quality standards. Success is measured by pipeline reliability data freshness SLA compliance and the quality of Gold-layer datasets powering Power BI executive dashboards.
- The work supports organizational decision-making by delivering trusted well-governed data to business executives and analytics consumers.
Required Skills:
- Experience building and optimizing big data pipelines using Azure Data Factory PySpark and SQL across structured and semi-structured data sets
- Hands-on experience implementing Medallion Architecture (Bronze/Silver/Gold)
- Experience with Delta Lake ACID transactions incremental loading schema evolution partitioning strategies
- Experience performing root cause analysis on pipeline failures and data quality issues to resolve SLA breaches and identify platform improvement opportunities
Azure Foundational Services :
- Working knowledge of: Azure Data Factory (ADF) ADLS Gen2 Azure SQL Database Azure Blob Storage Azure Key Vault Azure Monitor / Log Analytics Azure Event Hubs Microsoft Fabric Lakehouse Azure Active Directory / Entra ID (RBAC Service Principals)
Programming Languages:
- Proficiency in Python and PySpark for data transformation pipeline automation and large-scale distributed processing; strong SQL skills including window functions CTEs and query optimization across relational and lakehouse engines
Data Architecture:
- Solid understanding of Medallion Architecture dimensional modeling (Star Schema SCD Types 1/2/3) and the trade-offs between lakehouse data warehouse and data lake patterns
Pipeline Engineering:
- Ability to build robust ADF pipelines with ForEach Lookup Copy Activity and Data Flows; incremental loading via watermark or CDC; error handling retry logic and dead-letter patterns
Data Quality Experience:
- Experience implementing SLA-based data quality checks (freshness completeness row count) monitoring via Azure Monitor and ADF diagnostic logs and defining data quality agreements with business stakeholders.
DevOps for Data:
- Experience with Git-based workflows ADF Git integration CI/CD pipeline promotion across Dev/Test/Prod using Azure DevOps or GitHub Actions
Reporting Layer Awareness:
- Understanding of how Gold-layer data feeds Power BI DirectQuery vs. Import mode trade-offs dataset refresh patterns and semantic model collaboration with BI teams
- Ability to manage work across multiple concurrent pipeline projects prioritize by business impact and communicate status clearly to technical and non-technical stakeholders
Good to have skills:
- Experience with Microsoft Fabric (Lakehouse Notebooks OneLake Fabric Pipelines) active migration or greenfield project
- Experience with real-time / streaming workloads using Azure Event Hubs or Structured Streaming in PySpark
- Experience delivering data platforms for executive-level reporting via Power BI semantic models
TekWissen Group is an equal opportunity employer supporting workforce diversity.