Data Platform & Streaming Engineer
Job Summary
Key Responsibilities
Design and implement scalable data platform components (lake/lakehouse
data marts event streams) to support AI/ML and analytics use cases.
Build and maintain real-time and near-real-time streaming pipelines using tools
such as Kafka / Azure Event Hubs Spark Structured Streaming / Flink and
stream processing patterns.
Develop robust batch ingestion and transformation pipelines (ETL/ELT) using
Spark SQL and orchestration frameworks from SAP Engineering systems
SuccessFactors and other enterprise systems.
Implement data modeling standards (dimensional Data Vault medallion
architecture) suitable for analytics and ML feature readiness.
Ensure end-to-end data quality through validation rules anomaly checks
schema evolution strategies and automated testing.
Operationalize pipelines with CI/CD infrastructure-as-code version control and
environment promotion standards.
Establish observability (logging metrics tracing) SLOs and incident response
playbooks for data/streaming services.
Design and implement scalable data platform components (lake/lakehouse
data marts event streams) to support AI/ML and analytics use cases.
Build and maintain real-time and near-real-time streaming pipelines using tools
such as Kafka / Azure Event Hubs Spark Structured Streaming / Flink and
stream processing patterns.
Develop robust batch ingestion and transformation pipelines (ETL/ELT) using
Spark SQL and orchestration frameworks from SAP Engineering systems
SuccessFactors and other enterprise systems.
Implement data modeling standards (dimensional Data Vault medallion
architecture) suitable for analytics and ML feature readiness.
Ensure end-to-end data quality through validation rules anomaly checks
schema evolution strategies and automated testing.
Operationalize pipelines with CI/CD infrastructure-as-code version control and
environment promotion standards.
Establish observability (logging metrics tracing) SLOs and incident response
playbooks for data/streaming services.
Apply data governance controls: lineage cataloging retention access policies
encryption and privacy-by-design.
Optimize performance and cost across compute/storage by tuning jobs
partitioning strategies caching and streaming backpressure handling.
Collaborate with AI/ML engineers to enable feature stores training data
pipelines and online/offline consistency patterns.
Interface with business/domain stakeholders (e.g. project controls engineering
supply chain) to translate requirements into data products.
Document architectures runbooks and standards; mentor junior engineers and
promote engineering excellence.
encryption and privacy-by-design.
Optimize performance and cost across compute/storage by tuning jobs
partitioning strategies caching and streaming backpressure handling.
Collaborate with AI/ML engineers to enable feature stores training data
pipelines and online/offline consistency patterns.
Interface with business/domain stakeholders (e.g. project controls engineering
supply chain) to translate requirements into data products.
Document architectures runbooks and standards; mentor junior engineers and
promote engineering excellence.
Required Qualifications
5 years of experience in data engineering including streaming and
distributed processing.
Strong hands-on experience with streaming platforms (e.g. Kafka Azure
Event Hubs Confluent Pulsar) and patterns (event-driven architecture CDC
exactly-once/at-least-once).
Proficiency in Spark (PySpark/Scala) and SQL; experience with Spark
Structured Streaming or equivalent.
Experience building data platforms on cloud (preferably Azure): ADLS
Databricks Synapse Data Factory Event Hubs Functions & AKS
Strong software engineering fundamentals: Python/Scala/Java APIs data
structures reliability patterns.
Familiarity with data lakehouse concepts file formats (Delta/Iceberg/Hudi
Parquet) and schema management.
Experience with CI/CD (Azure DevOps/GitHub Actions) Git and IaC
(Terraform/Bicep/ARM).
Understanding of security fundamentals: IAM/RBAC secrets management
encryption and compliance-aware data handling.
5 years of experience in data engineering including streaming and
distributed processing.
Strong hands-on experience with streaming platforms (e.g. Kafka Azure
Event Hubs Confluent Pulsar) and patterns (event-driven architecture CDC
exactly-once/at-least-once).
Proficiency in Spark (PySpark/Scala) and SQL; experience with Spark
Structured Streaming or equivalent.
Experience building data platforms on cloud (preferably Azure): ADLS
Databricks Synapse Data Factory Event Hubs Functions & AKS
Strong software engineering fundamentals: Python/Scala/Java APIs data
structures reliability patterns.
Familiarity with data lakehouse concepts file formats (Delta/Iceberg/Hudi
Parquet) and schema management.
Experience with CI/CD (Azure DevOps/GitHub Actions) Git and IaC
(Terraform/Bicep/ARM).
Understanding of security fundamentals: IAM/RBAC secrets management
encryption and compliance-aware data handling.
Preferred Qualifications
Experience implementing CDC using Debezium Kafka Connect or cloud CDC
services.
Knowledge of ML data enablement: feature engineering pipelines feature
stores training/serving data consistency.
Experience with data governance tooling: Purview Data Catalog
lineage/metadata management.
Exposure to containerization/orchestration (Docker Kubernetes/AKS) for data
services.
Experience with time-series/IoT or industrial data streams (e.g. sensors
telemetry) or EPC domain datasets.
Familiarity with test automation for data pipelines (Great Expectations Deequ
custom frameworks) and data contract testing.
services.
Knowledge of ML data enablement: feature engineering pipelines feature
stores training/serving data consistency.
Experience with data governance tooling: Purview Data Catalog
lineage/metadata management.
Exposure to containerization/orchestration (Docker Kubernetes/AKS) for data
services.
Experience with time-series/IoT or industrial data streams (e.g. sensors
telemetry) or EPC domain datasets.
Familiarity with test automation for data pipelines (Great Expectations Deequ
custom frameworks) and data contract testing.
Education & Certifications
Bachelors degree in Computer Science Information Technology
Engineering or a related field (or equivalent practical experience).
Preferred (optional): Azure Data Engineer Associate Databricks certifications
Kafka/Confluent certifications.
Bachelors degree in Computer Science Information Technology
Engineering or a related field (or equivalent practical experience).
Preferred (optional): Azure Data Engineer Associate Databricks certifications
Kafka/Confluent certifications.
Experience
Minimum: 5 years in data engineering with demonstrated delivery of production-
grade pipelines.
Proven experience supporting real-time streaming workloads and platform
reliability in enterprise environments.
Minimum: 5 years in data engineering with demonstrated delivery of production-
grade pipelines.
Proven experience supporting real-time streaming workloads and platform
reliability in enterprise environments.