Job Description:
- Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem
- Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster configuration/optimization.
- Programming Mastery: Expert-level proficiency in Python and PySpark. Advanced SQL skills are essential.
- Data Warehousing Concepts: Strong understanding of data modeling principles including dimensional modeling (Kimball) data warehousing concepts and ETL/ELT design patterns.
- Cloud Proficiency: Proven experience working with a major cloud provider (Azure AWS or GCP) particularly with data storage S3 and related services.
- Software Engineering Mindset: Experience with software engineering best practices including version control (Git) code reviews testing and CI/CD.
Roles and Responsibilities
- Data Pipeline Development: Design code and deploy robust and scalable batch and streaming data pipelines using PySpark Spark SQL and Delta Live Tables to ingest data from sources such as Point-of-Sale (POS) e-commerce platforms loyalty systems and marketing clouds.
- Data Modeling and Transformation: Implement complex data transformations and business logic within the Medallion architecture (Bronze Silver Gold layers). Build and optimize the final Gold customer-dimension tables that will serve as the single source of truth.
- Data Quality: Implement data quality frameworks and cleansing routines to ensure the accuracy and trustworthiness of the Customer 360 data.
- Performance Optimization: Proactively monitor debug and tune Databricks jobs and Spark clusters for performance and cost-efficiency. Implement best practices for partitioning caching and data layout in Delta Lake.
- Infrastructure as Code (IaC) & CI/CD: Work with DevOps teams to manage Databricks environments clusters and job deployments using tools like Terraform and AWS DevOps/GitHub Actions. Champion and implement CI/CD best practices for data pipelines.
- Data Governance and Security: Implement data governance features within Databricks Unity Catalog including data lineage tracking access controls and data masking to ensure compliance and security.
- Collaboration: Partner closely with Functional Consultants Data Scientists and Analytics Engineers to understand their data requirements and deliver well-structured consumption-ready datasets.
Job Description: Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster...
Job Description:
- Experience: 5 years of hands-on data engineering experience with at least 3 years focused on the Databricks/Spark Ecosystem
- Databricks Expertise: Deep hands-on expertise with the Databricks Lakehouse Platform including Delta Lake Structured Streaming Delta Live Tables and cluster configuration/optimization.
- Programming Mastery: Expert-level proficiency in Python and PySpark. Advanced SQL skills are essential.
- Data Warehousing Concepts: Strong understanding of data modeling principles including dimensional modeling (Kimball) data warehousing concepts and ETL/ELT design patterns.
- Cloud Proficiency: Proven experience working with a major cloud provider (Azure AWS or GCP) particularly with data storage S3 and related services.
- Software Engineering Mindset: Experience with software engineering best practices including version control (Git) code reviews testing and CI/CD.
Roles and Responsibilities
- Data Pipeline Development: Design code and deploy robust and scalable batch and streaming data pipelines using PySpark Spark SQL and Delta Live Tables to ingest data from sources such as Point-of-Sale (POS) e-commerce platforms loyalty systems and marketing clouds.
- Data Modeling and Transformation: Implement complex data transformations and business logic within the Medallion architecture (Bronze Silver Gold layers). Build and optimize the final Gold customer-dimension tables that will serve as the single source of truth.
- Data Quality: Implement data quality frameworks and cleansing routines to ensure the accuracy and trustworthiness of the Customer 360 data.
- Performance Optimization: Proactively monitor debug and tune Databricks jobs and Spark clusters for performance and cost-efficiency. Implement best practices for partitioning caching and data layout in Delta Lake.
- Infrastructure as Code (IaC) & CI/CD: Work with DevOps teams to manage Databricks environments clusters and job deployments using tools like Terraform and AWS DevOps/GitHub Actions. Champion and implement CI/CD best practices for data pipelines.
- Data Governance and Security: Implement data governance features within Databricks Unity Catalog including data lineage tracking access controls and data masking to ensure compliance and security.
- Collaboration: Partner closely with Functional Consultants Data Scientists and Analytics Engineers to understand their data requirements and deliver well-structured consumption-ready datasets.
View more
View less