Position : Databrick Engineer
Location : Houston TX (Onsite)
Term : C2C/W2/Full Time Role
Job Description :
Client is modernizing its data platform to drive reliability scalability and real-time insights across key domains such as vehicle distribution dealer performance supply chain sales & incentives and warranty analytics. We are seeking a Databricks-centric Data Engineer to design build and optimize enterprise-grade data pipelines on AWS Databricks leveraging Delta Lake Unity Catalog Workflows and Structured Streaming. The ideal candidate is hands-on with Spark (PySpark/Scala) has strong SQL/data modeling skills.
Key Responsibilities
Platform & Architecture
- Design and implement Medallion (Bronze/Silver/Gold) architecture for batch and streaming workloads using Delta Lake.
- Build scalable data pipelines using Databricks Workflows Delta Live Tables (DLT) and Structured Streaming (Kafka/Event Hubs).
Engineering & Development
- Develop robust PySpark/Scala code with modular reusable libraries.
- Optimize Spark jobs (partitioning Z-Order file sizes caching AQE) and manage cost/performance on Databricks.
- Author Databricks SQL queries/dashboards for data quality checks and business consumption.
Data Governance & Security
- Implement Unity Catalog for centralized governance: catalogs schemas managed tables Row-Level & Column-Level Security and PII masking.
- Enforce cluster policies secret scopes/Key Vault and workspace best practices for compliance (e.g. SOX/CCPA/GDPR where applicable).
Stakeholder Engagement
- Work closely with GST business teams (Dealer Ops Sales Supply Chain) to translate requirements into data models and pipelines.
- Produce clear technical documentation lineage and runbooks; provide knowledge transfer to support teams.
Required Qualifications
- 8 years overall in Data Engineering; 3 years hands-on with Databricks and Apache Spark (PySpark/Scala).
- Strong SQL and data modeling (dimensional modeling star/snowflake slowly changing dimensions time-series).
- Expertise with Delta Lake Databricks SQL Unity Catalog DLT Workflows Autoloader Structured Streaming.
- Proven CI/CD experience: Git Azure DevOps Pipelines dbx unit/integration testing and environment promotion.
- Solid understanding of security & compliance (RBAC/ABAC data masking secrets network best practices).
- Excellent communication; ability to engage with product owners architects and business analysts.
Cloud BC Labs Inc is a digital transformation organization aimed at creating seamless solutions for clients to effectively manage their business operations. The company specializes in Business and Management Consulting AI/ML Data Analytics & Visualization Cloud Data Warehouse Migration Snowflake Implementation Informatica Implementation & Upgrade Staffing Services and Data Management Solutions
Position : Databrick Engineer Location : Houston TX (Onsite) Term : C2C/W2/Full Time Role Job Description : Client is modernizing its data platform to drive reliability scalability and real-time insights across key domains such as vehicle distribution dealer performance supply chain sales & inc...
Position : Databrick Engineer
Location : Houston TX (Onsite)
Term : C2C/W2/Full Time Role
Job Description :
Client is modernizing its data platform to drive reliability scalability and real-time insights across key domains such as vehicle distribution dealer performance supply chain sales & incentives and warranty analytics. We are seeking a Databricks-centric Data Engineer to design build and optimize enterprise-grade data pipelines on AWS Databricks leveraging Delta Lake Unity Catalog Workflows and Structured Streaming. The ideal candidate is hands-on with Spark (PySpark/Scala) has strong SQL/data modeling skills.
Key Responsibilities
Platform & Architecture
- Design and implement Medallion (Bronze/Silver/Gold) architecture for batch and streaming workloads using Delta Lake.
- Build scalable data pipelines using Databricks Workflows Delta Live Tables (DLT) and Structured Streaming (Kafka/Event Hubs).
Engineering & Development
- Develop robust PySpark/Scala code with modular reusable libraries.
- Optimize Spark jobs (partitioning Z-Order file sizes caching AQE) and manage cost/performance on Databricks.
- Author Databricks SQL queries/dashboards for data quality checks and business consumption.
Data Governance & Security
- Implement Unity Catalog for centralized governance: catalogs schemas managed tables Row-Level & Column-Level Security and PII masking.
- Enforce cluster policies secret scopes/Key Vault and workspace best practices for compliance (e.g. SOX/CCPA/GDPR where applicable).
Stakeholder Engagement
- Work closely with GST business teams (Dealer Ops Sales Supply Chain) to translate requirements into data models and pipelines.
- Produce clear technical documentation lineage and runbooks; provide knowledge transfer to support teams.
Required Qualifications
- 8 years overall in Data Engineering; 3 years hands-on with Databricks and Apache Spark (PySpark/Scala).
- Strong SQL and data modeling (dimensional modeling star/snowflake slowly changing dimensions time-series).
- Expertise with Delta Lake Databricks SQL Unity Catalog DLT Workflows Autoloader Structured Streaming.
- Proven CI/CD experience: Git Azure DevOps Pipelines dbx unit/integration testing and environment promotion.
- Solid understanding of security & compliance (RBAC/ABAC data masking secrets network best practices).
- Excellent communication; ability to engage with product owners architects and business analysts.
Cloud BC Labs Inc is a digital transformation organization aimed at creating seamless solutions for clients to effectively manage their business operations. The company specializes in Business and Management Consulting AI/ML Data Analytics & Visualization Cloud Data Warehouse Migration Snowflake Implementation Informatica Implementation & Upgrade Staffing Services and Data Management Solutions
View more
View less