Data Engineer 1

Warsaw - Poland

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Key responsibilities

Data pipeline development:

Design develop and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance
Ensure efficient ingestion of historical parquet datasets into Databricks.

Data quality & validation:

Implement validation reconciliation and quality assurance checks to ensure accuracy and completeness of migrated data
Handle schema mapping field transformations and metadata enrichment to standardize datasets
Ensure data governance quality assurance and compliance are integral to all migration activities.

Performance optimization:

Tune pipelines for speed and efficiency leveraging Databricks capabilities such as Delta Lake when appropriate
Manage resource usage and scheduling for large dataset transfers.

Collaboration:

Work closely with AI engineers data scientists and business stakeholders to define data access patterns required for upcoming AI POCs
Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.

Documentation & governance:

Maintain technical documentation for all data pipelines
Adhere to data governance compliance and security best practices throughout the migration process.

Qualifications :

Required skills & experience:

Proven experience in Python for data engineering tasks (PySpark Pandas etc.)
Hands-on experience with Databricks and the Spark ecosystem
Solid understanding of ETL/ELT concepts data modeling and pipeline orchestration
Experience working with Microsoft SQL Server including direct database connections
Practical experience ingesting Parquet data and managing large historical datasets
Knowledge of Delta Lake and structured streaming in Databricks is a plus
Familiarity with secure data transfer protocols between on-premises environments and cloud platforms
Strong problem-solving skills and ability to work independently.

Preferred qualifications:

Experience with AI/ML data preparation workflows
Understanding of data governance and compliance requirements related to customer and contract data
Familiarity with orchestration tools such as Databricks Workflows or Airflow
Experience in setting up Databricks environments from first use.

Remote Work :

Employment Type :

Full-time

Join our team to leverage your data engineering skills in a dynamic environment ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!Key responsibilitiesData pipeline development:Design develop and deploy Python-based ETL/E...

Key responsibilities

Data pipeline development:

Design develop and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance
Ensure efficient ingestion of historical parquet datasets into Databricks.

Data quality & validation:

Implement validation reconciliation and quality assurance checks to ensure accuracy and completeness of migrated data
Handle schema mapping field transformations and metadata enrichment to standardize datasets
Ensure data governance quality assurance and compliance are integral to all migration activities.

Performance optimization:

Tune pipelines for speed and efficiency leveraging Databricks capabilities such as Delta Lake when appropriate
Manage resource usage and scheduling for large dataset transfers.

Collaboration:

Work closely with AI engineers data scientists and business stakeholders to define data access patterns required for upcoming AI POCs
Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.

Documentation & governance:

Maintain technical documentation for all data pipelines
Adhere to data governance compliance and security best practices throughout the migration process.

Qualifications :

Required skills & experience:

Proven experience in Python for data engineering tasks (PySpark Pandas etc.)
Hands-on experience with Databricks and the Spark ecosystem
Solid understanding of ETL/ELT concepts data modeling and pipeline orchestration
Experience working with Microsoft SQL Server including direct database connections
Practical experience ingesting Parquet data and managing large historical datasets
Knowledge of Delta Lake and structured streaming in Databricks is a plus
Familiarity with secure data transfer protocols between on-premises environments and cloud platforms
Strong problem-solving skills and ability to work independently.

Preferred qualifications:

Experience with AI/ML data preparation workflows
Understanding of data governance and compliance requirements related to customer and contract data
Familiarity with orchestration tools such as Databricks Workflows or Airflow
Experience in setting up Databricks environments from first use.

Remote Work :

Employment Type :

Full-time

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

Inetum

Inetum is a European leader in digital services. Inetums team of 28,000 consultants and specialists strive every day to make a digital impact for businesses, public sector entities and society. Inetums solutions aim at contributing to its clients performance and innovation as well ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click