Join our team to leverage your data engineering skills in a dynamic environment ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!
Key responsibilities
Data pipeline development:
- Design develop and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance
- Ensure efficient ingestion of historical parquet datasets into Databricks.
Data quality & validation:
- Implement validation reconciliation and quality assurance checks to ensure accuracy and completeness of migrated data
- Handle schema mapping field transformations and metadata enrichment to standardize datasets
- Ensure data governance quality assurance and compliance are integral to all migration activities.
Performance optimization:
- Tune pipelines for speed and efficiency leveraging Databricks capabilities such as Delta Lake when appropriate
- Manage resource usage and scheduling for large dataset transfers.
Collaboration:
- Work closely with AI engineers data scientists and business stakeholders to define data access patterns required for upcoming AI POCs
- Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.
Documentation & governance:
- Maintain technical documentation for all data pipelines
- Adhere to data governance compliance and security best practices throughout the migration process.
Qualifications :
Required skills & experience:
- Proven experience in Python for data engineering tasks (PySpark Pandas etc.)
- Hands-on experience with Databricks and the Spark ecosystem
- Solid understanding of ETL/ELT concepts data modeling and pipeline orchestration
- Experience working with Microsoft SQL Server including direct database connections
- Practical experience ingesting Parquet data and managing large historical datasets
- Knowledge of Delta Lake and structured streaming in Databricks is a plus
- Familiarity with secure data transfer protocols between on-premises environments and cloud platforms
- Strong problem-solving skills and ability to work independently.
Preferred qualifications:
- Experience with AI/ML data preparation workflows
- Understanding of data governance and compliance requirements related to customer and contract data
- Familiarity with orchestration tools such as Databricks Workflows or Airflow
- Experience in setting up Databricks environments from first use.
Remote Work :
No
Employment Type :
Full-time
Join our team to leverage your data engineering skills in a dynamic environment ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!Key responsibilitiesData pipeline development:Design develop and deploy Python-based ETL/E...
Join our team to leverage your data engineering skills in a dynamic environment ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!
Key responsibilities
Data pipeline development:
- Design develop and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance
- Ensure efficient ingestion of historical parquet datasets into Databricks.
Data quality & validation:
- Implement validation reconciliation and quality assurance checks to ensure accuracy and completeness of migrated data
- Handle schema mapping field transformations and metadata enrichment to standardize datasets
- Ensure data governance quality assurance and compliance are integral to all migration activities.
Performance optimization:
- Tune pipelines for speed and efficiency leveraging Databricks capabilities such as Delta Lake when appropriate
- Manage resource usage and scheduling for large dataset transfers.
Collaboration:
- Work closely with AI engineers data scientists and business stakeholders to define data access patterns required for upcoming AI POCs
- Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.
Documentation & governance:
- Maintain technical documentation for all data pipelines
- Adhere to data governance compliance and security best practices throughout the migration process.
Qualifications :
Required skills & experience:
- Proven experience in Python for data engineering tasks (PySpark Pandas etc.)
- Hands-on experience with Databricks and the Spark ecosystem
- Solid understanding of ETL/ELT concepts data modeling and pipeline orchestration
- Experience working with Microsoft SQL Server including direct database connections
- Practical experience ingesting Parquet data and managing large historical datasets
- Knowledge of Delta Lake and structured streaming in Databricks is a plus
- Familiarity with secure data transfer protocols between on-premises environments and cloud platforms
- Strong problem-solving skills and ability to work independently.
Preferred qualifications:
- Experience with AI/ML data preparation workflows
- Understanding of data governance and compliance requirements related to customer and contract data
- Familiarity with orchestration tools such as Databricks Workflows or Airflow
- Experience in setting up Databricks environments from first use.
Remote Work :
No
Employment Type :
Full-time
View more
View less