Data Engineer (Azure | Databricks | Snowflake)
Overland Park, KS - USA
Job Summary
Overview:
TekWissen is a global workforce management provider headquartered in Ann Arbor Michigan that offers strategic talent solutions to our clients world-wide. Our client provider of digital technology and transformation information technology and services
Position: Data Engineer (Azure Databricks Snowflake)
Location: Frisco TX and Overland Park KS
Duration: 6 Months
Job Type: Temporary Assignment
Work Type: Hybrid
JOB SUMMARY:
- Needed for Azure-native third party data enrichment platform using Databricks/Spark Snowflake; focus on reliable governed pipelines strong Spark troubleshooting privacy/governance and cost-aware engineering;
Team / Business Context:
- You will join a data engineering team responsible for third party data enrichment augmenting first party datasets with external identity/attribute data to support analytics activation and research.
- The enriched datasets are consumed by multiple downstream systems and teams including the Customer Data Platform (CDP) and other analytics/research stakeholders.
- The platform is Azure-native and built primarily on Databricks (processing some ML workloads) and Snowflake (analytics/warehouse).
- A major focus is building reliable governed vendor agnostic datasets while ensuring privacy/compliance data governance and cost efficiency.
Key Responsibilities
- As a Data Engineer you will: Data Ingestion & Pipeline Development Build and enhance ingestion pipelines for large batch and event-driven paths (streaming may evolve over time).
- Integrate data from: Third party enrichment vendors (identity attributes very large volumes) Digital platforms via Conversion API (CAPI) integrations (through intermediary/middleware) Rewards/Promotions systems (e.g. TMT) for offer issuance/redemption/consumption data
- Data Quality Reliability & Operations Implement strong data validation idempotency replay/backfill strategies and deduplication to prevent quality drift.
- Own monitoring alerting dashboarding and operational readiness ( wrappers around core pipelines).
- Troubleshoot failures with root cause analysis not just reruns: Interpret Spark logs Diagnose performance issues (shuffle skew partitioning) Improve stability and SLA adherence Governance & Compliance (First-class NFR) Apply privacy compliance and governance requirements across pipelines and datasets.
- Support governance standards such as: Unity Catalog lineage access controls Managing PII vs non PII access Documentation of tables schemas catalogs and cluster usage
- Cost Governance & Performance Optimization Design pipelines with cost awareness from day one: Cluster sizing workload tuning efficient compute/storage usage Trade-off decisions balancing cost vs quality vs SLA Collaboration & Ownership Work in a small fast-moving team; be self-driven and ownership-oriented.
- Raise and manage data quality escalations when issues are detected.
- Contribute to evolving architecture (product is early-stage; first live month was recent).
Must-Have Skills (Screening Keywords)
- Candidate with hands-on recent experience in: Strong coding: PySpark SQL (hands-on not only orchestration)
- Databricks: notebooks/jobs performance tuning fundamentals medallion patterns Spark fundamentals: partitioning skew/shuffle optimization understanding failures via logs
- Snowflake: data modeling/usage for analytics/warehousing workloads Azure ecosystem: Azure Data Factory (ADF) (orchestration) Azure-native integrations and services exposure
- Data engineering reliability patterns: validation idempotency replay/backfills dedup auditability Data governance: Unity Catalog (preferred) lineage access control patterns PII handling Ownership mindset: can execute independently without constant approvals/check-ins
Nice-to-Have Skills
- Event-driven/streaming ingestion exposure (even if primary is batch today)
- Delta/Databricks patterns such as Delta Live Tables (DLT) (some workflows exist)
- Experience building config-driven export frameworks for multiple downstream consumers/vendors
- Exposure/interest in identity resolution concepts (ML optional; ETL strength is priority)
- Familiarity with CAPI integrations / marketing tech data signals
- Experience implementing operational telemetry: dashboards alerts SLA monitoring
- What Good Looks Like (Success Criteria) Ships reliable well-governed datasets with strong data quality practices
- Can scale pipelines for very large volumes (hundreds of millions of records per vendor)
- Prevents silent failures where quality degrades without obvious job failures
- Balances delivery speed with compliance governance and cost controls
TekWissen Group is an equal opportunity employer supporting workforce diversity.