Data Engineer (Azure | Databricks | Snowflake)

TekWissen LLC

Job Location:

Overland Park, KS - USA

Monthly Salary: Not Disclosed

Posted on: 4 hours ago

Vacancies: 1 Vacancy

Job Summary

Overview:

TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services

Position: Data Engineer (Azure Databricks Snowflake)

Location: Frisco TX and Overland Park KS

Duration: 6 Months

Job Type: Temporary Assignment

Work Type: Hybrid

JOB SUMMARY:

Needed for Azure-native third party data enrichment platform using Databricks/Spark Snowflake; focus on reliable governed pipelines strong Spark troubleshooting privacy/governance and cost-aware engineering;

Team / Business Context:

You will join a data engineering team responsible for third party data enrichment augmenting first party datasets with external identity/attribute data to support analytics activation and research.
The enriched datasets are consumed by multiple downstream systems and teams including the Customer Data Platform (CDP) and other analytics/research stakeholders.
The platform is Azure-native and built primarily on Databricks (processing some ML workloads) and Snowflake (analytics/warehouse).
A major focus is building reliable governed vendor agnostic datasets while ensuring privacy/compliance data governance and cost efficiency.

Key Responsibilities

As a Data Engineer you will: Data Ingestion & Pipeline Development Build and enhance ingestion pipelines for large batch and event-driven paths (streaming may evolve over time).
Integrate data from: Third party enrichment vendors (identity attributes very large volumes) Digital platforms via Conversion API (CAPI) integrations (through intermediary/middleware) Rewards/Promotions systems (e.g. TMT) for offer issuance/redemption/consumption data
Data Quality Reliability & Operations Implement strong data validation idempotency replay/backfill strategies and deduplication to prevent quality drift.
Own monitoring alerting dashboarding and operational readiness ( wrappers around core pipelines).
Troubleshoot failures with root cause analysis not just reruns: Interpret Spark logs Diagnose performance issues (shuffle skew partitioning) Improve stability and SLA adherence Governance & Compliance (First-class NFR) Apply privacy compliance and governance requirements across pipelines and datasets.
Support governance standards such as: Unity Catalog lineage access controls Managing PII vs non PII access Documentation of tables schemas catalogs and cluster usage
Cost Governance & Performance Optimization Design pipelines with cost awareness from day one: Cluster sizing workload tuning efficient compute/storage usage Trade-off decisions balancing cost vs quality vs SLA Collaboration & Ownership Work in a small fast-moving team; be self-driven and ownership-oriented.
Raise and manage data quality escalations when issues are detected.
Contribute to evolving architecture (product is early-stage; first live month was recent).

Must-Have Skills (Screening Keywords)

Candidate with hands-on recent experience in: Strong coding: PySpark SQL (hands-on not only orchestration)
Databricks: notebooks/jobs performance tuning fundamentals medallion patterns Spark fundamentals: partitioning skew/shuffle optimization understanding failures via logs
Snowflake: data modeling/usage for analytics/warehousing workloads Azure ecosystem: Azure Data Factory (ADF) (orchestration) Azure-native integrations and services exposure
Data engineering reliability patterns: validation idempotency replay/backfills dedup auditability Data governance: Unity Catalog (preferred) lineage access control patterns PII handling Ownership mindset: can execute independently without constant approvals/check-ins

Nice-to-Have Skills

Event-driven/streaming ingestion exposure (even if primary is batch today)
Delta/Databricks patterns such as Delta Live Tables (DLT) (some workflows exist)
Experience building config-driven export frameworks for multiple downstream consumers/vendors
Exposure/interest in identity resolution concepts (ML optional; ETL strength is priority)
Familiarity with CAPI integrations / marketing tech data signals
Experience implementing operational telemetry: dashboards alerts SLA monitoring
What Good Looks Like (Success Criteria) Ships reliable well-governed datasets with strong data quality practices
Can scale pipelines for very large volumes (hundreds of millions of records per vendor)
Prevents silent failures where quality degrades without obvious job failures
Balances delivery speed with compliance governance and cost controls

TekWissen Group is an equal opportunity employer supporting workforce diversity.

Overview: TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services Position: Data Engin...