Data Engineer

Houston, MS - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Overview:
Delivers the Palantir Foundry exit on a modern Snowflake stack by building reliable performant and testable ELT pipelines; recreates Foundry transformations and rule-based event logic; and ensures historical data extraction reconciliation and cutover readiness.
Years of Experience:
7 years overall; 3 years hands-on with Snowflake.
Key Responsibilities:

Extract historical datasets from Palantir (dataset export parquet) to S3/ADLS and load into Snowflake; implement checksum and reconciliation controls.
Rebuild Foundry transformations as dbt models and/or Snowflake SQL; implement curated schemas and incremental patterns using Streams and Tasks.
Implement the batch event/rules engine that evaluates time-series plus reference data on a schedule (e.g. 30 60 minutes) and produces auditable event tables.
Configure orchestration in Airflow running on AKS and where appropriate Snowflake Tasks; monitor alert and document operational runbooks.
Optimize warehouses queries clustering and caching; manage cost with Resource Monitors and usage telemetry.
Author automated tests (dbt tests Great Expectations or equivalent) validate parity versus legacy outputs and support UAT and cutover.
Collaborate with BI/analytics teams (Sigma Power BI) on dataset contracts performance and security requirements.

Required Qualifications:

Strong Snowflake SQL and Python for ELT utilities and data validation.
Production experience with dbt (models tests macros documentation lineage).
Orchestration with Airflow (preferably on AKS/Kubernetes) and use of Snowflake Tasks/Streams for incrementals.
Proficiency with cloud object storage (S3/ADLS) file formats (Parquet/CSV) and bulk/incremental load patterns (Snowpipe External Tables).
Version control and CI/CD with GitHub/GitLab; environment promotion and release hygiene.
Data quality and reconciliation fundamentals including checksums row/aggregate parity and schema integrity tests.
Performance and cost tuning using query profiles micro-partitioning behavior and warehouse sizing policies.

Preferred Qualifications:

Experience migrating from legacy platforms (Palantir Foundry Cloudera/Hive/Spark) and familiarity with Trino/Starburst federation patterns.
Time-series data handling and rules/pattern detection; exposure to Snowpark or UDFs for complex transforms.
Familiarity with consumption patterns in Sigma and Power BI (Import DirectQuery composite models RLS/OLS considerations).
Security and governance in Snowflake (RBAC masking row/column policies) tagging and cost allocation.
Exposure to containerized workloads on AKS lightweight apps for surfacing data (e.g. Streamlit) and basic observability practices.

Overview: Delivers the Palantir Foundry exit on a modern Snowflake stack by building reliable performant and testable ELT pipelines; recreates Foundry transformations and rule-based event logic; and ensures historical data extraction reconciliation and cutover readiness. Years of Experience: 7 years...

Extract historical datasets from Palantir (dataset export parquet) to S3/ADLS and load into Snowflake; implement checksum and reconciliation controls.
Rebuild Foundry transformations as dbt models and/or Snowflake SQL; implement curated schemas and incremental patterns using Streams and Tasks.
Implement the batch event/rules engine that evaluates time-series plus reference data on a schedule (e.g. 30 60 minutes) and produces auditable event tables.
Configure orchestration in Airflow running on AKS and where appropriate Snowflake Tasks; monitor alert and document operational runbooks.
Optimize warehouses queries clustering and caching; manage cost with Resource Monitors and usage telemetry.
Author automated tests (dbt tests Great Expectations or equivalent) validate parity versus legacy outputs and support UAT and cutover.
Collaborate with BI/analytics teams (Sigma Power BI) on dataset contracts performance and security requirements.

Required Qualifications:

Strong Snowflake SQL and Python for ELT utilities and data validation.
Production experience with dbt (models tests macros documentation lineage).
Orchestration with Airflow (preferably on AKS/Kubernetes) and use of Snowflake Tasks/Streams for incrementals.
Proficiency with cloud object storage (S3/ADLS) file formats (Parquet/CSV) and bulk/incremental load patterns (Snowpipe External Tables).
Version control and CI/CD with GitHub/GitLab; environment promotion and release hygiene.
Data quality and reconciliation fundamentals including checksums row/aggregate parity and schema integrity tests.
Performance and cost tuning using query profiles micro-partitioning behavior and warehouse sizing policies.

Preferred Qualifications:

Experience migrating from legacy platforms (Palantir Foundry Cloudera/Hive/Spark) and familiarity with Trino/Starburst federation patterns.
Time-series data handling and rules/pattern detection; exposure to Snowpark or UDFs for complex transforms.
Familiarity with consumption patterns in Sigma and Power BI (Import DirectQuery composite models RLS/OLS considerations).
Security and governance in Snowflake (RBAC masking row/column policies) tagging and cost allocation.
Exposure to containerized workloads on AKS lightweight apps for surfacing data (e.g. Streamlit) and basic observability practices.