Senior Data Engineer Lakehouse & Data Engineering Frameworks REF5085J

Budapest - Hungary

Monthly Salary: Not Disclosed

Posted on: 15 hours ago

Vacancies: 1 Vacancy

Job Summary

We are looking for a Senior Data Engineer to build and operate scalable data ingestion and CDC capabilities on our Azure-based Lakehouse platform. Beyond developing pipelines in Azure Data Factory and Databricks you will help us mature our engineering approach: we increasingly deliver ingestion and CDC preparation through Python projects and reusable frameworks and we expect this role to apply professional software engineering practices (clean architecture testing code reviews packaging CI/CD and operational excellence).

Our platform runs batch-first processing with streaming sources landed raw and processed in batch and selective evolution toward streaming where needed.

You will work within the Common Data Intelligence Hub collaborating with data architects analytics engineers and solution designers to enable robust data products and governed data flows across the enterprise.

Your team owns ingestion & CDC engineering end-to-end (design build operate observability reliability reusable components).
You contribute to platform standards (contracts layer semantics readiness criteria) and reference implementations.
You do not primarily own cloud infrastructure provisioning (e.g. enterprise networking core IaC foundations) but you collaborate with the platform team by defining requirements reviewing changes and maintaining deployable code for pipelines and jobs.

Platform data engineering & delivery

Design and develop ingestion pipelines using Azure and Databricks services (ADF pipelines Databricks notebooks/jobs/workflows).
Implement and operate CDC patterns (inserts updates deletes) including late arriving data and reprocessing strategies.
Structure and maintain bronze and silver Delta Lake datasets (schema enforcement de-duplication performance tuning).
Build transformation-ready datasets and interfaces (stable schemas contracts metadata expectations) for analytics engineers and downstream modeling.
Ingest data in a batch-first approach (raw landing replayability idempotent batch processing) and help evolve patterns toward true streaming where future use cases require it.

Software engineering for data frameworks

Develop and maintain Python-based ingestion/CDC components as production-grade software (modules/packages versioning releases).
Apply engineering best practices: code reviews unit/integration tests static analysis formatting/linting type hints and clear documentation.
Establish and improve CI/CD pipelines for data engineering code and pipeline assets (build test security checks deploy rollback patterns).
Drive reuse via shared libraries templates and reference implementations; reduce one-off notebook solutions.

Operations reliability & observability

Implement logging metrics tracing and data pipeline observability (run-time KPIs SLAs alerting incident readiness).
Troubleshoot distributed processing and production issues end-to-end.
Work with solution designers on event-based triggers and orchestration workflows; contribute to operational standards.
Implement operational and security hygiene: secure secret handling least-privilege access patterns and support for auditability (e.g. logs/metadata/lineage expectations).

Collaboration & leadership

Mentor other engineers and promote consistent engineering practices across teams.
Contribute to the Data Engineering Community of Practice and help define standards patterns and guardrails.
Contribute to architectural discussions (layer semantics readiness criteria contracts and governance).
Work with architects and governance stakeholders to ensure datasets meet governance requirements (cataloging ownership documentation access patterns compliance constraints) before promotion to higher layers.

Qualifications :

35 years of hands-on experience building data pipelines with Databricks and Azure in production.
Strong knowledge of Delta Lake patterns (CDC schema evolution deduplication partitioning performance optimization).
Advanced Python engineering skills: building maintainable projects (packaging dependency management testing tooling).
Solid SQL skills (complex transformations debugging performance tuning).
Proven experience with CI/CD and Git-based workflows (merge requests branching strategies automated testing environment promotion).
Ability to diagnose and resolve issues in distributed systems (Spark execution cluster/runtime behavior data correctness).
Good understanding of data modeling principles and how they influence ingestion and performance.
Practical experience applying data governance and security controls in a Lakehouse environment (permissions/access patterns secure secret handling audit needs; Unity Catalog is a plus).
Proactive reliable and able to work independently within agile teams.
Strong communication skills in English (spoken and written).

Technical Core Skills

Databricks (Jobs/Workflows Notebooks Spark Autoloader Delta Lake)
Azure Functions & Durable Functions (orchestration long-running workflows)
SQL (analysis performance tuning)
PySpark and Python (production-grade)
Azure Data Factory (pipelines triggers linked services monitoring)
ADLS Gen2 (lake storage design folder/partition strategy access controls lifecycle/retention)

Software engineering toolchain

Git code review workflows
CI/CD pipelines (e.g. GitLab CI Azure DevOps)
Testing: unit/integration tests test data strategies
Code quality: linting/formatting static analysis type hints
Packaging & dependency management (e.g. Poetry/pip-tools/conda whichever you standardize on)

Governance security & orchestration

Unity Catalog (cataloging permissions/access patterns basic governance controls)
Secure secret handling and service authentication patterns (Key Vault or equivalent)
Event Grid / Azure Functions / event-driven orchestration
Observability (structured logging metrics alerting; Log Analytics or equivalent)

Additional Information :

* Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.

Remote Work :

Employment Type :

Full-time

Our platform runs batch-first processing with streaming sources landed raw and processed in batch and selective evolution toward streaming where needed.

Your team owns ingestion & CDC engineering end-to-end (design build operate observability reliability reusable components).
You contribute to platform standards (contracts layer semantics readiness criteria) and reference implementations.
You do not primarily own cloud infrastructure provisioning (e.g. enterprise networking core IaC foundations) but you collaborate with the platform team by defining requirements reviewing changes and maintaining deployable code for pipelines and jobs.

Platform data engineering & delivery

Design and develop ingestion pipelines using Azure and Databricks services (ADF pipelines Databricks notebooks/jobs/workflows).
Implement and operate CDC patterns (inserts updates deletes) including late arriving data and reprocessing strategies.
Structure and maintain bronze and silver Delta Lake datasets (schema enforcement de-duplication performance tuning).
Build transformation-ready datasets and interfaces (stable schemas contracts metadata expectations) for analytics engineers and downstream modeling.
Ingest data in a batch-first approach (raw landing replayability idempotent batch processing) and help evolve patterns toward true streaming where future use cases require it.

Software engineering for data frameworks

Develop and maintain Python-based ingestion/CDC components as production-grade software (modules/packages versioning releases).
Apply engineering best practices: code reviews unit/integration tests static analysis formatting/linting type hints and clear documentation.
Establish and improve CI/CD pipelines for data engineering code and pipeline assets (build test security checks deploy rollback patterns).
Drive reuse via shared libraries templates and reference implementations; reduce one-off notebook solutions.

Operations reliability & observability

Implement logging metrics tracing and data pipeline observability (run-time KPIs SLAs alerting incident readiness).
Troubleshoot distributed processing and production issues end-to-end.
Work with solution designers on event-based triggers and orchestration workflows; contribute to operational standards.
Implement operational and security hygiene: secure secret handling least-privilege access patterns and support for auditability (e.g. logs/metadata/lineage expectations).

Collaboration & leadership

Mentor other engineers and promote consistent engineering practices across teams.
Contribute to the Data Engineering Community of Practice and help define standards patterns and guardrails.
Contribute to architectural discussions (layer semantics readiness criteria contracts and governance).
Work with architects and governance stakeholders to ensure datasets meet governance requirements (cataloging ownership documentation access patterns compliance constraints) before promotion to higher layers.

Qualifications :

35 years of hands-on experience building data pipelines with Databricks and Azure in production.
Strong knowledge of Delta Lake patterns (CDC schema evolution deduplication partitioning performance optimization).
Advanced Python engineering skills: building maintainable projects (packaging dependency management testing tooling).
Solid SQL skills (complex transformations debugging performance tuning).
Proven experience with CI/CD and Git-based workflows (merge requests branching strategies automated testing environment promotion).
Ability to diagnose and resolve issues in distributed systems (Spark execution cluster/runtime behavior data correctness).
Good understanding of data modeling principles and how they influence ingestion and performance.
Practical experience applying data governance and security controls in a Lakehouse environment (permissions/access patterns secure secret handling audit needs; Unity Catalog is a plus).
Proactive reliable and able to work independently within agile teams.
Strong communication skills in English (spoken and written).

Technical Core Skills

Databricks (Jobs/Workflows Notebooks Spark Autoloader Delta Lake)
Azure Functions & Durable Functions (orchestration long-running workflows)
SQL (analysis performance tuning)
PySpark and Python (production-grade)
Azure Data Factory (pipelines triggers linked services monitoring)
ADLS Gen2 (lake storage design folder/partition strategy access controls lifecycle/retention)

Software engineering toolchain

Git code review workflows
CI/CD pipelines (e.g. GitLab CI Azure DevOps)
Testing: unit/integration tests test data strategies
Code quality: linting/formatting static analysis type hints
Packaging & dependency management (e.g. Poetry/pip-tools/conda whichever you standardize on)

Governance security & orchestration

Unity Catalog (cataloging permissions/access patterns basic governance controls)
Secure secret handling and service authentication patterns (Key Vault or equivalent)
Event Grid / Azure Functions / event-driven orchestration
Observability (structured logging metrics alerting; Log Analytics or equivalent)

Additional Information :

* Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.

Remote Work :

Employment Type :

Full-time

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

Deutsche Telekom IT Solutions

The largest ICT employer in Hungary, Deutsche Telekom IT Solutions (formerly IT-Services Hungary, ITSH) is a subsidiary of the Deutsche Telekom Group. Established in 2006, the company provides a wide portfolio of IT and telecommunications services with more than 5000 employees. ITSH w ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click