Databricks Architect

Apptad Inc

Not Interested
Bookmark
Report This Job

profile Job Location:

Marlborough, NH - USA

profile Monthly Salary: Not Disclosed
Posted on: 8 hours ago
Vacancies: 1 Vacancy

Job Summary

Job Description:

  • Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem
  • Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster configuration/optimization.
  • Programming Mastery: Expert-level proficiency in Python and PySpark. Advanced SQL skills are essential.
  • Data Warehousing Concepts: Strong understanding of data modeling principles including dimensional modeling (Kimball) data warehousing concepts and ETL/ELT design patterns.
  • Cloud Proficiency: Proven experience working with a major cloud provider (Azure AWS or GCP) particularly with data storage S3 and related services.
  • Software Engineering Mindset: Experience with software engineering best practices including version control (Git) code reviews testing and CI/CD.

Roles and Responsibilities

  • Data Pipeline Development: Design code and deploy robust and scalable batch and streaming data pipelines using PySpark Spark SQL and Delta Live Tables to ingest data from sources such as Point-of-Sale (POS) e-commerce platforms loyalty systems and marketing clouds.
  • Data Modeling and Transformation: Implement complex data transformations and business logic within the Medallion architecture (Bronze Silver Gold layers). Build and optimize the final Gold customer-dimension tables that will serve as the single source of truth.
  • Data Quality: Implement data quality frameworks and cleansing routines to ensure the accuracy and trustworthiness of the Customer 360 data.
  • Performance Optimization: Proactively monitor debug and tune Databricks jobs and Spark clusters for performance and cost-efficiency. Implement best practices for partitioning caching and data layout in Delta Lake.
  • Infrastructure as Code (IaC) & CI/CD: Work with DevOps teams to manage Databricks environments clusters and job deployments using tools like Terraform and AWS DevOps/GitHub Actions. Champion and implement CI/CD best practices for data pipelines.
  • Data Governance and Security: Implement data governance features within Databricks Unity Catalog including data lineage tracking access controls and data masking to ensure compliance and security.
  • Collaboration: Partner closely with Functional Consultants Data Scientists and Analytics Engineers to understand their data requirements and deliver well-structured consumption-ready datasets.
Job Description: Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster...
View more view more