Role: Senior/Lead Data Engineer.
Location: Toronto ON (Onsite).
Duration: Long Term Contract.
Job Overview:
- Architect and build the high-performance data foundations that power enterprise analytics. This senior role demands software engineering rigor applied to data infrastructure-optimizing latency compute costs and scalability using cutting-edge tools like Polars Ibis and Griffin.
High-Performance Data Engineering:
- Build optimized data structures using Polars and Ibis for sub-second query performance at scale.
- Implement memory-efficient transformations that minimize compute costs by 50%.
Advanced Orchestration & Governance:
- Design complex Airflow DAGs managing 100 dependencies with precise SLAs.
- Deploy Griffin for automated data quality profiling anomaly detection and lineage tracking.
Cloud-Native Data Lake Architecture:
- Architect Azure Data Lake Storage (ADLS Gen2) with hierarchical partitioning optimized for Databricks/Synapse.
- Implement liquid clustering Z-ordering and predictive optimization for petabyte-scale workloads.
NoSQL & Hybrid Storage:
- Evaluate/implement Cassandra or MongoDB for high-velocity semi-structured patterns.
- Design polyglot persistence strategies balancing SQL/NoSQL for optimal access patterns.
Technical Ownership:
- Pipeline Development: Python 3.10 PySpark Java Complex ELT patterns.
- Cloud: Azure Databricks ADLS Gen2 ADF Synapse Analytics.
- Orchestration: Airflow 2.7 NiFi Hamilton.
- Data Processing: Polars Ibis Pandas (performance-optimized).
- DevOps: Docker Kubernetes GitHub Actions Terraform.
- Quality: Griffin Great Expectations Monte Carlo.
- NoSQL: Cassandra MongoDB Atlas Cosmos DB.
Required Experience (6-10 years):
- 3 years production Polars/Ibis (memory-efficient joins lazy evaluation streaming).
- 2 years complex Airflow (dynamic DAGs XComs custom operators Celery Executor).
- Azure Data Lake architecture (Delta Lake Unity Catalog ABFSS protocol optimization).
- PySpark mastery (Delta Live Tables Adaptive Query Execution Photon engine).
- NoSQL production experience (Cassandra data modeling MongoDB aggregation pipelines).
- Long-term impact (1 year projects demonstrating sustained platform ownership).
Role: Senior/Lead Data Engineer. Location: Toronto ON (Onsite). Duration: Long Term Contract. Job Overview: Architect and build the high-performance data foundations that power enterprise analytics. This senior role demands software engineering rigor applied to data infrastructure-optimizing lat...
Role: Senior/Lead Data Engineer.
Location: Toronto ON (Onsite).
Duration: Long Term Contract.
Job Overview:
- Architect and build the high-performance data foundations that power enterprise analytics. This senior role demands software engineering rigor applied to data infrastructure-optimizing latency compute costs and scalability using cutting-edge tools like Polars Ibis and Griffin.
High-Performance Data Engineering:
- Build optimized data structures using Polars and Ibis for sub-second query performance at scale.
- Implement memory-efficient transformations that minimize compute costs by 50%.
Advanced Orchestration & Governance:
- Design complex Airflow DAGs managing 100 dependencies with precise SLAs.
- Deploy Griffin for automated data quality profiling anomaly detection and lineage tracking.
Cloud-Native Data Lake Architecture:
- Architect Azure Data Lake Storage (ADLS Gen2) with hierarchical partitioning optimized for Databricks/Synapse.
- Implement liquid clustering Z-ordering and predictive optimization for petabyte-scale workloads.
NoSQL & Hybrid Storage:
- Evaluate/implement Cassandra or MongoDB for high-velocity semi-structured patterns.
- Design polyglot persistence strategies balancing SQL/NoSQL for optimal access patterns.
Technical Ownership:
- Pipeline Development: Python 3.10 PySpark Java Complex ELT patterns.
- Cloud: Azure Databricks ADLS Gen2 ADF Synapse Analytics.
- Orchestration: Airflow 2.7 NiFi Hamilton.
- Data Processing: Polars Ibis Pandas (performance-optimized).
- DevOps: Docker Kubernetes GitHub Actions Terraform.
- Quality: Griffin Great Expectations Monte Carlo.
- NoSQL: Cassandra MongoDB Atlas Cosmos DB.
Required Experience (6-10 years):
- 3 years production Polars/Ibis (memory-efficient joins lazy evaluation streaming).
- 2 years complex Airflow (dynamic DAGs XComs custom operators Celery Executor).
- Azure Data Lake architecture (Delta Lake Unity Catalog ABFSS protocol optimization).
- PySpark mastery (Delta Live Tables Adaptive Query Execution Photon engine).
- NoSQL production experience (Cassandra data modeling MongoDB aggregation pipelines).
- Long-term impact (1 year projects demonstrating sustained platform ownership).
View more
View less