Position :: Data Engineer
Location: Dallas /Plano TX or Middletown NJ
Local candidates only no relocation
Telecom experience is a big plus especially AT&T but not necessary.
Job Description
Job Title: Data Engineer (Streaming & Full Stack Databricks)
Role Summary
We are seeking a high-performing Data Engineer to design and implement a real-time data platform using the Medallion Architecture. You will be responsible for the end-to-end development of data pipelines-from ingesting real-time source data into Bronze transforming it into a relational silver layer and finally delivering high-concurrency consumption-ready JSON Gold tables. You will act as a Full Stack data professional handling everything from infrastructure automation (DataOps) to complex nested data modeling.
Key Responsibilities
Real-Time Ingestion: Build scalable ingestion pipelines using Auto Loader and Spark Structured Streaming to capture data from Kafka Event Hubs or CDC sources into raw Delta tables.
Relational Transformation: Develop ELT logic to cleanse deduplicate and normalize data into a relational format. Ensure ACID compliance and exactly-once processing semantics.
JSON API Optimization: Design and build the layer specifically for client consumption. This involves flattening/nesting data into optimized JSON structures within Delta tables to support low-latency API queries.
Advanced Orchestration: Implement and manage complex workflows using Delta Live Tables (DLT) or Standard Streaming Live tables and Databricks Workflows to ensure data freshness and lineage.
Governance & Security: Use Unity Catalog to enforce fine-grained access control (row/column level) and maintain a searchable data catalog for consuming clients.
DataOps & Automation: Own the deployment lifecycle using Databricks Asset Bundles (DABs) and CI/CD pipelines (GitHub Actions/Azure DevOps) to ensure reproducible environments.
Performance Tuning: Optimize streaming triggers watermarking and stateful processing to minimize latency and manage cloud costs effectively.
Skills & Qualifications
1. Technical Core (Databricks & Spark)
-Expert PySpark/Scala: Deep understanding of Spark internals broadcast joins and RDD/Dataframe partitioning.
Delta Lake Mastery: Proficiency in Delta features like Z-Ordering Liquid Clustering Change Data Feed (CDF) and Time Travel.
Streaming Patterns: Hands-on experience with Watermarking Checkpoints and handling late-arriving data in Structured Streaming.
2. Data Modeling & Languages
SQL: Expert-level SQL for complex transformations and window functions.
JSON/Semi-Structured Data: Mastery of parsing and generating complex nested JSON objects within Spark (e.g. struct array tojson fromjson).
Medallion Design: Proven experience moving data across Bronze Silver and Gold layers with clear Data Contracts.
3. Full Stack & DevOps
CI/CD: Experience automating data pipeline deployments (Git-based workflows).
Observability: Ability to set up monitoring and alerts using Databricks SQL Alerts or Grafana to track pipeline lag.
4. Soft Skills
Architectural Thinking: Ability to decide when to use Continuous vs. AvailableNow streaming based on cost vs. latency requirements.
Client Focus: Understanding how an API client (e.g. a React app or a microservice) will consume the Gold layer JSON.
Position :: Data Engineer Location: Dallas /Plano TX or Middletown NJ Local candidates only no relocation Telecom experience is a big plus especially AT&T but not necessary. Job Description Job Title: Data Engineer (Streaming & Full Stack Databricks) Role Summary We are seeking a high-performin...
Position :: Data Engineer
Location: Dallas /Plano TX or Middletown NJ
Local candidates only no relocation
Telecom experience is a big plus especially AT&T but not necessary.
Job Description
Job Title: Data Engineer (Streaming & Full Stack Databricks)
Role Summary
We are seeking a high-performing Data Engineer to design and implement a real-time data platform using the Medallion Architecture. You will be responsible for the end-to-end development of data pipelines-from ingesting real-time source data into Bronze transforming it into a relational silver layer and finally delivering high-concurrency consumption-ready JSON Gold tables. You will act as a Full Stack data professional handling everything from infrastructure automation (DataOps) to complex nested data modeling.
Key Responsibilities
Real-Time Ingestion: Build scalable ingestion pipelines using Auto Loader and Spark Structured Streaming to capture data from Kafka Event Hubs or CDC sources into raw Delta tables.
Relational Transformation: Develop ELT logic to cleanse deduplicate and normalize data into a relational format. Ensure ACID compliance and exactly-once processing semantics.
JSON API Optimization: Design and build the layer specifically for client consumption. This involves flattening/nesting data into optimized JSON structures within Delta tables to support low-latency API queries.
Advanced Orchestration: Implement and manage complex workflows using Delta Live Tables (DLT) or Standard Streaming Live tables and Databricks Workflows to ensure data freshness and lineage.
Governance & Security: Use Unity Catalog to enforce fine-grained access control (row/column level) and maintain a searchable data catalog for consuming clients.
DataOps & Automation: Own the deployment lifecycle using Databricks Asset Bundles (DABs) and CI/CD pipelines (GitHub Actions/Azure DevOps) to ensure reproducible environments.
Performance Tuning: Optimize streaming triggers watermarking and stateful processing to minimize latency and manage cloud costs effectively.
Skills & Qualifications
1. Technical Core (Databricks & Spark)
-Expert PySpark/Scala: Deep understanding of Spark internals broadcast joins and RDD/Dataframe partitioning.
Delta Lake Mastery: Proficiency in Delta features like Z-Ordering Liquid Clustering Change Data Feed (CDF) and Time Travel.
Streaming Patterns: Hands-on experience with Watermarking Checkpoints and handling late-arriving data in Structured Streaming.
2. Data Modeling & Languages
SQL: Expert-level SQL for complex transformations and window functions.
JSON/Semi-Structured Data: Mastery of parsing and generating complex nested JSON objects within Spark (e.g. struct array tojson fromjson).
Medallion Design: Proven experience moving data across Bronze Silver and Gold layers with clear Data Contracts.
3. Full Stack & DevOps
CI/CD: Experience automating data pipeline deployments (Git-based workflows).
Observability: Ability to set up monitoring and alerts using Databricks SQL Alerts or Grafana to track pipeline lag.
4. Soft Skills
Architectural Thinking: Ability to decide when to use Continuous vs. AvailableNow streaming based on cost vs. latency requirements.
Client Focus: Understanding how an API client (e.g. a React app or a microservice) will consume the Gold layer JSON.
View more
View less