To develop implement and optimize complex Data Warehouse (DWH) and Data Lakehouse solutions using the Databricks platform (including Delta Lake Unity Catalog and Spark) to ensure a scalable high-performance and governed data foundation for analytics reporting and Machine Learning.
Responsibilities
A. Databricks Development and Architecture
- Advanced Design and Implementation: Design and implement robust scalable and high-performance ETL/ELT data pipelines using PySpark/Scala and Databricks SQL on the Databricks platform.
- Delta Lake: Expertise in implementing and optimizing the Medallion architecture (Bronze Silver Gold) using Delta Lake to ensure data quality consistency and historical tracking.
- Lakehouse Platform: Efficient implementation of the Lakehouse architecture on Databricks combining best practices from DWH and Data Lake.
- Performance Optimization: Optimize Databricks clusters Spark operations and Delta tables (e.g. Z-ordering Compaction Tuning Queries) to reduce latency and computational costs.
- Streaming: Design and implement real-time/near-real-time data processing solutions using Spark Structured Streaming and Delta Live Tables (DLT).
B. Governance and Security
- Unity Catalog: Implement and manage Unity Catalog for centralized data governance fine-grained security (row/column-level security) and data lineage.
- Data Quality: Define and implement data quality standards and rules (e.g. using DLT or Great Expectations) to maintain data integrity.
C. Operations and Collaboration
- Orchestration: Develop and manage complex workflows using Databricks Workflows (Jobs) or external tools (e.g. Azure Data Factory Airflow) to automate pipelines.
- DevOps/CI/CD: Integrate Databricks pipelines into CI/CD processes using tools like Git Databricks Repos and Bundles.
- Collaboration: Work closely with Data Scientists Analysts and Architects to understand business requirements and deliver optimal technical solutions.
- Mentorship: Provide technical guidance and mentorship to junior developers and promote best practices.
Qualifications :
A. Mandatory Knowledge (Expert Level)
- Databricks Platform: Proven expert-level experience with the entire Databricks ecosystem (Workspace Cluster Management Notebooks Databricks SQL).
- Apache Spark: In-depth knowledge of Spark architecture (RDD DataFrames Spark SQL) and advanced optimization techniques.
- Delta Lake: Expertise in implementing and managing Delta Lake (ACID properties Time Travel Merge Optimize Vacuum).
- Programming Languages: Advanced/expert-level proficiency in Python (with PySpark) and/or Scala (with Spark).
- SQL: Advanced/expert-level skills in SQL and Data Modeling (Dimensional 3NF Data Vault).
- Cloud: Solid experience with a major Cloud platform (AWS Azure or GCP) especially with storage services (S3 ADLS Gen2 GCS) and networking.
B. Additional Knowledge (Major Advantage)
- Unity Catalog: Hands-on experience with implementing and managing Unity Catalog.
- Lakeflow: Experience with Delta Live Tables (DLT) and Databricks Workflows.
- ML/AI Concepts: Understanding of basic MLOps concepts and experience with MLflow to facilitate integration with Data Science teams.
- DevOps: Experience with Terraform or equivalent tools for Infrastructure as Code (IaC).
- Certifications: Databricks certifications (e.g. Databricks Certified Data Engineer Professional) are a significant advantage.
C. Education and Experience
- Education: Bachelors degree in Computer Science Engineering Mathematics or a relevant technical field.
- Professional Experience: Minimum of 5 years of experience in Data Engineering with at least 3 years of experience working with Databricks and Spark at scale.
Additional Information :
Benefits
- Full access to foreign language learning platform
- Personalized access to tech learning platforms
- Tailored workshops and trainings to sustain your growth
- Medical insurance
- Meal tickets
- Monthly budget to allocate on flexible benefit platform
- Access to 7 Card services
- Wellbeing activities and gatherings
Working model: hybrid - 2 days at the office
Remote Work :
Yes
Employment Type :
Full-time
To develop implement and optimize complex Data Warehouse (DWH) and Data Lakehouse solutions using the Databricks platform (including Delta Lake Unity Catalog and Spark) to ensure a scalable high-performance and governed data foundation for analytics reporting and Machine Learning.ResponsibilitiesA....
To develop implement and optimize complex Data Warehouse (DWH) and Data Lakehouse solutions using the Databricks platform (including Delta Lake Unity Catalog and Spark) to ensure a scalable high-performance and governed data foundation for analytics reporting and Machine Learning.
Responsibilities
A. Databricks Development and Architecture
- Advanced Design and Implementation: Design and implement robust scalable and high-performance ETL/ELT data pipelines using PySpark/Scala and Databricks SQL on the Databricks platform.
- Delta Lake: Expertise in implementing and optimizing the Medallion architecture (Bronze Silver Gold) using Delta Lake to ensure data quality consistency and historical tracking.
- Lakehouse Platform: Efficient implementation of the Lakehouse architecture on Databricks combining best practices from DWH and Data Lake.
- Performance Optimization: Optimize Databricks clusters Spark operations and Delta tables (e.g. Z-ordering Compaction Tuning Queries) to reduce latency and computational costs.
- Streaming: Design and implement real-time/near-real-time data processing solutions using Spark Structured Streaming and Delta Live Tables (DLT).
B. Governance and Security
- Unity Catalog: Implement and manage Unity Catalog for centralized data governance fine-grained security (row/column-level security) and data lineage.
- Data Quality: Define and implement data quality standards and rules (e.g. using DLT or Great Expectations) to maintain data integrity.
C. Operations and Collaboration
- Orchestration: Develop and manage complex workflows using Databricks Workflows (Jobs) or external tools (e.g. Azure Data Factory Airflow) to automate pipelines.
- DevOps/CI/CD: Integrate Databricks pipelines into CI/CD processes using tools like Git Databricks Repos and Bundles.
- Collaboration: Work closely with Data Scientists Analysts and Architects to understand business requirements and deliver optimal technical solutions.
- Mentorship: Provide technical guidance and mentorship to junior developers and promote best practices.
Qualifications :
A. Mandatory Knowledge (Expert Level)
- Databricks Platform: Proven expert-level experience with the entire Databricks ecosystem (Workspace Cluster Management Notebooks Databricks SQL).
- Apache Spark: In-depth knowledge of Spark architecture (RDD DataFrames Spark SQL) and advanced optimization techniques.
- Delta Lake: Expertise in implementing and managing Delta Lake (ACID properties Time Travel Merge Optimize Vacuum).
- Programming Languages: Advanced/expert-level proficiency in Python (with PySpark) and/or Scala (with Spark).
- SQL: Advanced/expert-level skills in SQL and Data Modeling (Dimensional 3NF Data Vault).
- Cloud: Solid experience with a major Cloud platform (AWS Azure or GCP) especially with storage services (S3 ADLS Gen2 GCS) and networking.
B. Additional Knowledge (Major Advantage)
- Unity Catalog: Hands-on experience with implementing and managing Unity Catalog.
- Lakeflow: Experience with Delta Live Tables (DLT) and Databricks Workflows.
- ML/AI Concepts: Understanding of basic MLOps concepts and experience with MLflow to facilitate integration with Data Science teams.
- DevOps: Experience with Terraform or equivalent tools for Infrastructure as Code (IaC).
- Certifications: Databricks certifications (e.g. Databricks Certified Data Engineer Professional) are a significant advantage.
C. Education and Experience
- Education: Bachelors degree in Computer Science Engineering Mathematics or a relevant technical field.
- Professional Experience: Minimum of 5 years of experience in Data Engineering with at least 3 years of experience working with Databricks and Spark at scale.
Additional Information :
Benefits
- Full access to foreign language learning platform
- Personalized access to tech learning platforms
- Tailored workshops and trainings to sustain your growth
- Medical insurance
- Meal tickets
- Monthly budget to allocate on flexible benefit platform
- Access to 7 Card services
- Wellbeing activities and gatherings
Working model: hybrid - 2 days at the office
Remote Work :
Yes
Employment Type :
Full-time
View more
View less