Databricks Architect

Apptad Inc

Job Location:

Marlborough, NH - USA

Monthly Salary: Not Disclosed

Posted on: 8 hours ago

Vacancies: 1 Vacancy

Job Summary

Job Description:

Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem
Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster configuration/optimization.
Programming Mastery: Expert-level proficiency in Python and PySpark. Advanced SQL skills are essential.
Data Warehousing Concepts: Strong understanding of data modeling principles including dimensional modeling (Kimball) data warehousing concepts and ETL/ELT design patterns.
Cloud Proficiency: Proven experience working with a major cloud provider (Azure AWS or GCP) particularly with data storage S3 and related services.
Software Engineering Mindset: Experience with software engineering best practices including version control (Git) code reviews testing and CI/CD.

Roles and Responsibilities

Data Pipeline Development: Design code and deploy robust and scalable batch and streaming data pipelines using PySpark Spark SQL and Delta Live Tables to ingest data from sources such as Point-of-Sale (POS) e-commerce platforms loyalty systems and marketing clouds.
Data Modeling and Transformation: Implement complex data transformations and business logic within the Medallion architecture (Bronze Silver Gold layers). Build and optimize the final Gold customer-dimension tables that will serve as the single source of truth.
Data Quality: Implement data quality frameworks and cleansing routines to ensure the accuracy and trustworthiness of the Customer 360 data.
Performance Optimization: Proactively monitor debug and tune Databricks jobs and Spark clusters for performance and cost-efficiency. Implement best practices for partitioning caching and data layout in Delta Lake.
Infrastructure as Code (IaC) & CI/CD: Work with DevOps teams to manage Databricks environments clusters and job deployments using tools like Terraform and AWS DevOps/GitHub Actions. Champion and implement CI/CD best practices for data pipelines.
Data Governance and Security: Implement data governance features within Databricks Unity Catalog including data lineage tracking access controls and data masking to ensure compliance and security.
Collaboration: Partner closely with Functional Consultants Data Scientists and Analytics Engineers to understand their data requirements and deliver well-structured consumption-ready datasets.

Job Description: Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster...

Job Description:

Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem
Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster configuration/optimization.
Programming Mastery: Expert-level proficiency in Python and PySpark. Advanced SQL skills are essential.
Data Warehousing Concepts: Strong understanding of data modeling principles including dimensional modeling (Kimball) data warehousing concepts and ETL/ELT design patterns.
Cloud Proficiency: Proven experience working with a major cloud provider (Azure AWS or GCP) particularly with data storage S3 and related services.
Software Engineering Mindset: Experience with software engineering best practices including version control (Git) code reviews testing and CI/CD.

Roles and Responsibilities

Data Pipeline Development: Design code and deploy robust and scalable batch and streaming data pipelines using PySpark Spark SQL and Delta Live Tables to ingest data from sources such as Point-of-Sale (POS) e-commerce platforms loyalty systems and marketing clouds.
Data Modeling and Transformation: Implement complex data transformations and business logic within the Medallion architecture (Bronze Silver Gold layers). Build and optimize the final Gold customer-dimension tables that will serve as the single source of truth.
Data Quality: Implement data quality frameworks and cleansing routines to ensure the accuracy and trustworthiness of the Customer 360 data.
Performance Optimization: Proactively monitor debug and tune Databricks jobs and Spark clusters for performance and cost-efficiency. Implement best practices for partitioning caching and data layout in Delta Lake.
Infrastructure as Code (IaC) & CI/CD: Work with DevOps teams to manage Databricks environments clusters and job deployments using tools like Terraform and AWS DevOps/GitHub Actions. Champion and implement CI/CD best practices for data pipelines.
Data Governance and Security: Implement data governance features within Databricks Unity Catalog including data lineage tracking access controls and data masking to ensure compliance and security.
Collaboration: Partner closely with Functional Consultants Data Scientists and Analytics Engineers to understand their data requirements and deliver well-structured consumption-ready datasets.