Senior Databricks Data Engineer

Bucharest - Romania

Monthly Salary: Not Disclosed

Posted on: 2 hours ago

Vacancies: 1 Vacancy

Job Summary

To develop implement and optimize complex Data Warehouse (DWH) and Data Lakehouse solutions using the Databricks platform (including Delta Lake Unity Catalog and Spark) to ensure a scalable high-performance and governed data foundation for analytics reporting and Machine Learning.

Responsibilities

A. Databricks Development and Architecture

Advanced Design and Implementation: Design and implement robust scalable and high-performance ETL/ELT data pipelines using PySpark/Scala and Databricks SQL on the Databricks platform.
Delta Lake: Expertise in implementing and optimizing the Medallion architecture (Bronze Silver Gold) using Delta Lake to ensure data quality consistency and historical tracking.
Lakehouse Platform: Efficient implementation of the Lakehouse architecture on Databricks combining best practices from DWH and Data Lake.
Performance Optimization: Optimize Databricks clusters Spark operations and Delta tables (e.g. Z-ordering Compaction Tuning Queries) to reduce latency and computational costs.
Streaming: Design and implement real-time/near-real-time data processing solutions using Spark Structured Streaming and Delta Live Tables (DLT).

B. Governance and Security

Unity Catalog: Implement and manage Unity Catalog for centralized data governance fine-grained security (row/column-level security) and data lineage.
Data Quality: Define and implement data quality standards and rules (e.g. using DLT or Great Expectations) to maintain data integrity.

C. Operations and Collaboration

Orchestration: Develop and manage complex workflows using Databricks Workflows (Jobs) or external tools (e.g. Azure Data Factory Airflow) to automate pipelines.
DevOps/CI/CD: Integrate Databricks pipelines into CI/CD processes using tools like Git Databricks Repos and Bundles.
Collaboration: Work closely with Data Scientists Analysts and Architects to understand business requirements and deliver optimal technical solutions.
Mentorship: Provide technical guidance and mentorship to junior developers and promote best practices.

Qualifications :

A. Mandatory Knowledge (Expert Level)

Databricks Platform: Proven expert-level experience with the entire Databricks ecosystem (Workspace Cluster Management Notebooks Databricks SQL).
Apache Spark: In-depth knowledge of Spark architecture (RDD DataFrames Spark SQL) and advanced optimization techniques.
Delta Lake: Expertise in implementing and managing Delta Lake (ACID properties Time Travel Merge Optimize Vacuum).
Programming Languages: Advanced/expert-level proficiency in Python (with PySpark) and/or Scala (with Spark).
SQL: Advanced/expert-level skills in SQL and Data Modeling (Dimensional 3NF Data Vault).
Cloud: Solid experience with a major Cloud platform (AWS Azure or GCP) especially with storage services (S3 ADLS Gen2 GCS) and networking.

B. Additional Knowledge (Major Advantage)

Unity Catalog: Hands-on experience with implementing and managing Unity Catalog.
Lakeflow: Experience with Delta Live Tables (DLT) and Databricks Workflows.
ML/AI Concepts: Understanding of basic MLOps concepts and experience with MLflow to facilitate integration with Data Science teams.
DevOps: Experience with Terraform or equivalent tools for Infrastructure as Code (IaC).
Certifications: Databricks certifications (e.g. Databricks Certified Data Engineer Professional) are a significant advantage.

C. Education and Experience

Education: Bachelors degree in Computer Science Engineering Mathematics or a relevant technical field.
Professional Experience: Minimum of 5 years of experience in Data Engineering with at least 3 years of experience working with Databricks and Spark at scale.

Additional Information :

Benefits

Full access to foreign language learning platform
Personalized access to tech learning platforms
Tailored workshops and trainings to sustain your growth
Medical insurance
Meal tickets
Monthly budget to allocate on flexible benefit platform
Access to 7 Card services
Wellbeing activities and gatherings

Working model: hybrid - 2 days at the office

Remote Work :

Yes

Employment Type :

Full-time

Responsibilities

A. Databricks Development and Architecture

Advanced Design and Implementation: Design and implement robust scalable and high-performance ETL/ELT data pipelines using PySpark/Scala and Databricks SQL on the Databricks platform.
Delta Lake: Expertise in implementing and optimizing the Medallion architecture (Bronze Silver Gold) using Delta Lake to ensure data quality consistency and historical tracking.
Lakehouse Platform: Efficient implementation of the Lakehouse architecture on Databricks combining best practices from DWH and Data Lake.
Performance Optimization: Optimize Databricks clusters Spark operations and Delta tables (e.g. Z-ordering Compaction Tuning Queries) to reduce latency and computational costs.
Streaming: Design and implement real-time/near-real-time data processing solutions using Spark Structured Streaming and Delta Live Tables (DLT).

B. Governance and Security

Unity Catalog: Implement and manage Unity Catalog for centralized data governance fine-grained security (row/column-level security) and data lineage.
Data Quality: Define and implement data quality standards and rules (e.g. using DLT or Great Expectations) to maintain data integrity.

C. Operations and Collaboration

Orchestration: Develop and manage complex workflows using Databricks Workflows (Jobs) or external tools (e.g. Azure Data Factory Airflow) to automate pipelines.
DevOps/CI/CD: Integrate Databricks pipelines into CI/CD processes using tools like Git Databricks Repos and Bundles.
Collaboration: Work closely with Data Scientists Analysts and Architects to understand business requirements and deliver optimal technical solutions.
Mentorship: Provide technical guidance and mentorship to junior developers and promote best practices.

Qualifications :

A. Mandatory Knowledge (Expert Level)

Databricks Platform: Proven expert-level experience with the entire Databricks ecosystem (Workspace Cluster Management Notebooks Databricks SQL).
Apache Spark: In-depth knowledge of Spark architecture (RDD DataFrames Spark SQL) and advanced optimization techniques.
Delta Lake: Expertise in implementing and managing Delta Lake (ACID properties Time Travel Merge Optimize Vacuum).
Programming Languages: Advanced/expert-level proficiency in Python (with PySpark) and/or Scala (with Spark).
SQL: Advanced/expert-level skills in SQL and Data Modeling (Dimensional 3NF Data Vault).
Cloud: Solid experience with a major Cloud platform (AWS Azure or GCP) especially with storage services (S3 ADLS Gen2 GCS) and networking.

B. Additional Knowledge (Major Advantage)

Unity Catalog: Hands-on experience with implementing and managing Unity Catalog.
Lakeflow: Experience with Delta Live Tables (DLT) and Databricks Workflows.
ML/AI Concepts: Understanding of basic MLOps concepts and experience with MLflow to facilitate integration with Data Science teams.
DevOps: Experience with Terraform or equivalent tools for Infrastructure as Code (IaC).
Certifications: Databricks certifications (e.g. Databricks Certified Data Engineer Professional) are a significant advantage.

C. Education and Experience

Education: Bachelors degree in Computer Science Engineering Mathematics or a relevant technical field.
Professional Experience: Minimum of 5 years of experience in Data Engineering with at least 3 years of experience working with Databricks and Spark at scale.

Additional Information :

Benefits

Full access to foreign language learning platform
Personalized access to tech learning platforms
Tailored workshops and trainings to sustain your growth
Medical insurance
Meal tickets
Monthly budget to allocate on flexible benefit platform
Access to 7 Card services
Wellbeing activities and gatherings

Working model: hybrid - 2 days at the office

Remote Work :

Yes

Employment Type :

Full-time

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

Inetum

Inetum is a European leader in digital services. Inetums team of 28,000 consultants and specialists strive every day to make a digital impact for businesses, public sector entities and society. Inetums solutions aim at contributing to its clients performance and innovation as well ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click