Architect Principal Data Engineer
Job Summary
Architecture: Lakehouse (Medallion: Bronze/Silver/Gold)
Compute: Apache Spark (Expert level)
Storage/Table Format: Delta Lake (Required) Iceberg (Strong Plus)
Transformation: dbt (Expert level)
Orchestration: Airflow Cosmos
Infrastructure: Cloud-native (GCP preferred) Databricks/Commercial tooling
Patterns: Microservices Event-driven CI/CD IaC (Terraform). Platform Zero: Evaluate select and deploy the foundational Lakehouse infrastructure.
Core Frameworks: Build the reusable libraries/templates for the rest of the engineering team to build pipelines.
Legacy Decommission: Design the technical map to migrate all high-priority finance/business data to the new stack.
Performance Baseline: Optimize Spark/Cloud costs by at least 20% through better resource management.
Cloud Native: Deep understanding of IAM VPCs Object Storage and serverless compute.
Migrations: Proven track record of moving petabyte-scale data from legacy systems (On-prem Redshift Snowflake) to a Lakehouse without data loss.
Compute: Apache Spark (Expert level)
Storage/Table Format: Delta Lake (Required) Iceberg (Strong Plus)
Transformation: dbt (Expert level)
Orchestration: Airflow Cosmos
Infrastructure: Cloud-native (GCP preferred) Databricks/Commercial tooling
Patterns: Microservices Event-driven CI/CD IaC (Terraform). Platform Zero: Evaluate select and deploy the foundational Lakehouse infrastructure.
Core Frameworks: Build the reusable libraries/templates for the rest of the engineering team to build pipelines.
Legacy Decommission: Design the technical map to migrate all high-priority finance/business data to the new stack.
Performance Baseline: Optimize Spark/Cloud costs by at least 20% through better resource management.
Cloud Native: Deep understanding of IAM VPCs Object Storage and serverless compute.
Migrations: Proven track record of moving petabyte-scale data from legacy systems (On-prem Redshift Snowflake) to a Lakehouse without data loss.