The Data Integration Intern will play a key role in developing scalable and automated dataprocessing workflows to support ongoing data science initiatives. This role involves building ingestion pipelines in Azure Databricks creating Python and PySparkbased datacleaning and validation workflows and implementing corporate standards to ensure full traceability clear data lineage and reproducible processes.
What youre responsible for:
- Developing automated data cleaning and validation workflows using Python and PySpark notebooks along with Databricks pipelines to support a data science project.
- Building robust ingestion pipelines in Databricks to efficiently load process and prepare data for downstream analytics and modeling.
- Ensuring full traceability of data cleaning methodologies by designing workflows that follow the Medallion Architecture (Bronze Silver Gold) maintaining clear lineage and reproducibility.
- Implementing corporate standards for datacleaning notebooks to improve readability consistency maintainability and ease of handoff across teams.
- Developing reusable welldocumented functions (when necessary) that are readable modular and include strong errorhandling measures to support scalable and reliable data processing.
Qualifications :
Qualifications To join our team
- Bachelors or Masters degree in Computer Science Data Science Data Engineering or a related field.
- 12 years of experience in a similar corporate or professional environment.
- 1 year of experience building ingestion pipelines using PySpark and Azure Databricks is considered a strong asset.
- Handson experience working on data science projects within a corporate setting including exposure to modelready data preparation and validation workflows.
- Proficiency in Python including experience with Jupyter notebooks Azure Data Warehouse and Databricks/Fabric/PySpark notebooks.
- Experience contributing to exploratory data analysis (EDA) and strategic data cleaning for data science initiatives is considered a strong asset.
Informations complémentaires :
The advantages (!):
- Eligible for paid vacation from the first day;
- Comprehensive group insurance plan;
- Telemedicine (Unlimited access to a doctor 24/7);
- Group RRSP with employer matching contributions;
- AIM tuition assistance grants;
- Referral program;
- Exclusive employee discounts on parts at any Honeycomb location;
- Company-organized events throughout the year (barbecue Christmas party etc.);
- Exceptional advancement opportunities.
Remote Work :
No
Employment Type :
Full-time
The Data Integration Intern will play a key role in developing scalable and automated dataprocessing workflows to support ongoing data science initiatives. This role involves building ingestion pipelines in Azure Databricks creating Python and PySparkbased datacleaning and validation workflows and i...
The Data Integration Intern will play a key role in developing scalable and automated dataprocessing workflows to support ongoing data science initiatives. This role involves building ingestion pipelines in Azure Databricks creating Python and PySparkbased datacleaning and validation workflows and implementing corporate standards to ensure full traceability clear data lineage and reproducible processes.
What youre responsible for:
- Developing automated data cleaning and validation workflows using Python and PySpark notebooks along with Databricks pipelines to support a data science project.
- Building robust ingestion pipelines in Databricks to efficiently load process and prepare data for downstream analytics and modeling.
- Ensuring full traceability of data cleaning methodologies by designing workflows that follow the Medallion Architecture (Bronze Silver Gold) maintaining clear lineage and reproducibility.
- Implementing corporate standards for datacleaning notebooks to improve readability consistency maintainability and ease of handoff across teams.
- Developing reusable welldocumented functions (when necessary) that are readable modular and include strong errorhandling measures to support scalable and reliable data processing.
Qualifications :
Qualifications To join our team
- Bachelors or Masters degree in Computer Science Data Science Data Engineering or a related field.
- 12 years of experience in a similar corporate or professional environment.
- 1 year of experience building ingestion pipelines using PySpark and Azure Databricks is considered a strong asset.
- Handson experience working on data science projects within a corporate setting including exposure to modelready data preparation and validation workflows.
- Proficiency in Python including experience with Jupyter notebooks Azure Data Warehouse and Databricks/Fabric/PySpark notebooks.
- Experience contributing to exploratory data analysis (EDA) and strategic data cleaning for data science initiatives is considered a strong asset.
Informations complémentaires :
The advantages (!):
- Eligible for paid vacation from the first day;
- Comprehensive group insurance plan;
- Telemedicine (Unlimited access to a doctor 24/7);
- Group RRSP with employer matching contributions;
- AIM tuition assistance grants;
- Referral program;
- Exclusive employee discounts on parts at any Honeycomb location;
- Company-organized events throughout the year (barbecue Christmas party etc.);
- Exceptional advancement opportunities.
Remote Work :
No
Employment Type :
Full-time
View more
View less