Data Engineer 1

Inetum

Not Interested
Bookmark
Report This Job

profile Job Location:

Warsaw - Poland

profile Monthly Salary: Not Disclosed
Posted on: 17 hours ago
Vacancies: 1 Vacancy

Job Summary

Join our team to leverage your data engineering skills in a dynamic environment ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!

Key responsibilities

Data pipeline development:

  • Design develop and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance
  • Ensure efficient ingestion of historical parquet datasets into Databricks.

Data quality & validation:

  • Implement validation reconciliation and quality assurance checks to ensure accuracy and completeness of migrated data
  • Handle schema mapping field transformations and metadata enrichment to standardize datasets
  • Ensure data governance quality assurance and compliance are integral to all migration activities.

Performance optimization:

  • Tune pipelines for speed and efficiency leveraging Databricks capabilities such as Delta Lake when appropriate
  • Manage resource usage and scheduling for large dataset transfers.

Collaboration:

  • Work closely with AI engineers data scientists and business stakeholders to define data access patterns required for upcoming AI POCs
  • Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.

Documentation & governance:

  • Maintain technical documentation for all data pipelines
  • Adhere to data governance compliance and security best practices throughout the migration process.

Qualifications :

Required skills & experience:

  • Proven experience in Python for data engineering tasks (PySpark Pandas etc.)
  • Hands-on experience with Databricks and the Spark ecosystem
  • Solid understanding of ETL/ELT concepts data modeling and pipeline orchestration
  • Experience working with Microsoft SQL Server including direct database connections
  • Practical experience ingesting Parquet data and managing large historical datasets
  • Knowledge of Delta Lake and structured streaming in Databricks is a plus
  • Familiarity with secure data transfer protocols between on-premises environments and cloud platforms
  • Strong problem-solving skills and ability to work independently.

Preferred qualifications:

  • Experience with AI/ML data preparation workflows
  • Understanding of data governance and compliance requirements related to customer and contract data
  • Familiarity with orchestration tools such as Databricks Workflows or Airflow
  • Experience in setting up Databricks environments from first use.

Remote Work :

No


Employment Type :

Full-time

Join our team to leverage your data engineering skills in a dynamic environment ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!Key responsibilitiesData pipeline development:Design develop and deploy Python-based ETL/E...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Company Logo

Inetum is a European leader in digital services. Inetum’s team of 28,000 consultants and specialists strive every day to make a digital impact for businesses, public sector entities and society. Inetum’s solutions aim at contributing to its clients’ performance and innovation as well ... View more

View Profile View Profile