Senior Data Engineer

RepRisk AG

Job Location:

Berlin - Germany

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About You

Are you looking for an opportunity to build robust scalable data infrastructure that powers meaningful cutting-edge machine learning projects Do you want to work at a company where your contributions have a real measurable impact - and youre recognized and rewarded for it

If youre passionate about data architecture pipelines and enabling ethical tech development then this is the perfect role for you. We value autonomy giving you the space to bring innovative engineering solutions to life in an inclusive feedback-oriented environment. Your work will directly support NLP and machine learning initiatives that drive corporate responsibility through technology.

Your Responsibilities

As our new Senior Data Engineer you will architect build and scale a modern data platform leveraging Databricks and lakehouse architecture principles. You will lead the design and delivery of enterprise-grade data infrastructure as part of our global Technology division. You will also:

Architect and implement end-to-end lakehouse solutions on Databricks leveraging Delta Lake Unity Catalog and the Medallion architecture (Bronze/Silver/Gold)
Design build and maintain scalable reliable ELT pipelines using Databricks workflows Delta Live Tables and Apache Spark
Develop and optimize high-throughput streaming and batch data pipelines using Spark Structured Streaming and Auto Loader
Drive data platform performance tuning cost optimization and cluster/compute governance across Databricks environments
Define and enforce data contracts schemas and governance standards through Unity Catalog and Delta Lake
Ensure data quality observability and lineage across the platform using tools such as Databricks Data Observability and Great Expectations
Collaborate cross-functionally with data scientists analysts and platform teams to deliver reliable self-serve data products
Establish and champion internal data engineering best practices standards and reusable frameworks
Stay current with the Databricks ecosystem lakehouse trends and emerging data engineering patterns
Participate in code reviews to maintain high standards of quality performance and security
Engage actively in Agile/Scrum ceremonies contributing architectural insights and technical direction to the team

Qualifications :

You Offer

A Bachelors Degree within subjects related to computer science or related STEM field
5 years of hands-on experience in Data Engineering or similar role
Strong proficiency in Python and SQL
Solid experience with Batch processing (e.g. AWS Glue / dbt) and stream processing technologies (e.g. Kafka)
Proven experience with Dimensional Data Modelling and Data Vault methodologies
Experience with Data Orchestration tools such as Airflow or Dagster
Familiarity with data quality and validation frameworks (e.g. Great Expectations SODA or similar)
Experience integrating with Metadata tools such as Collibra OpenMetadata etc.
Strong understanding of version control (Git) and CI/CD pipelines
Experience working with cloud platforms (AWS preferred)
Practical experience with Data Lakehouse concepts and technologies such as Databricks and Snowflake
A proactive mindset with strong ownership initiative and drive to push things forward
Strong communication skills with professional proficiency in English

Additionally the following are a plus

Delivering workflow configurations in BPM based software such as Camunda etc.
Experience working with Machine Learning teams familiarity with ML/DL/NLP concepts

Additional Information :

Please note that we will only consider candidates with a valid work permit

Remote Work :

Employment Type :

Full-time

About You Are you looking for an opportunity to build robust scalable data infrastructure that powers meaningful cutting-edge machine learning projects Do you want to work at a company where your contributions have a real measurable impact - and youre recognized and rewarded for it If youre passiona...

About You

Your Responsibilities

Architect and implement end-to-end lakehouse solutions on Databricks leveraging Delta Lake Unity Catalog and the Medallion architecture (Bronze/Silver/Gold)
Design build and maintain scalable reliable ELT pipelines using Databricks workflows Delta Live Tables and Apache Spark
Develop and optimize high-throughput streaming and batch data pipelines using Spark Structured Streaming and Auto Loader
Drive data platform performance tuning cost optimization and cluster/compute governance across Databricks environments
Define and enforce data contracts schemas and governance standards through Unity Catalog and Delta Lake
Ensure data quality observability and lineage across the platform using tools such as Databricks Data Observability and Great Expectations
Collaborate cross-functionally with data scientists analysts and platform teams to deliver reliable self-serve data products
Establish and champion internal data engineering best practices standards and reusable frameworks
Stay current with the Databricks ecosystem lakehouse trends and emerging data engineering patterns
Participate in code reviews to maintain high standards of quality performance and security
Engage actively in Agile/Scrum ceremonies contributing architectural insights and technical direction to the team

Qualifications :

You Offer

A Bachelors Degree within subjects related to computer science or related STEM field
5 years of hands-on experience in Data Engineering or similar role
Strong proficiency in Python and SQL
Solid experience with Batch processing (e.g. AWS Glue / dbt) and stream processing technologies (e.g. Kafka)
Proven experience with Dimensional Data Modelling and Data Vault methodologies
Experience with Data Orchestration tools such as Airflow or Dagster
Familiarity with data quality and validation frameworks (e.g. Great Expectations SODA or similar)
Experience integrating with Metadata tools such as Collibra OpenMetadata etc.
Strong understanding of version control (Git) and CI/CD pipelines
Experience working with cloud platforms (AWS preferred)
Practical experience with Data Lakehouse concepts and technologies such as Databricks and Snowflake
A proactive mindset with strong ownership initiative and drive to push things forward
Strong communication skills with professional proficiency in English

Additionally the following are a plus

Delivering workflow configurations in BPM based software such as Camunda etc.
Experience working with Machine Learning teams familiarity with ML/DL/NLP concepts

Additional Information :

Please note that we will only consider candidates with a valid work permit

Remote Work :

Employment Type :

Full-time

Key Skills

Apply Now

About Company

RepRisk AG

About Us RepRisk is a rapidly growing global company and a pioneer in the ESG data science field. Our goal is to make the world a better place by creating transparency in the business world – we are driving positive change via the power of data. We combine AI and machine learning with ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click