Design build and operate scalable cloud-native data pipelines and services that deliver high-quality domain-owned data products. Implement data mesh principles by helping domains publish discover and consume data products using shared standards and infrastructure. Build and maintain real-time and batch pipelines using tools like Spark Kafka and Airflow ensuring reliability and performance at scale. Develop metadata lineage and catalog integrations so data products are easily discoverable and trusted across domains. Work directly with data producers and consumers to define schemas contracts and access patterns that improve interoperability. Automate testing validation and deployment through CI/CD pipelines to ensure fast consistent delivery of data products. Monitor and troubleshoot data pipelines and systems driving improvements in observability scalability and cost efficiency. Collaborate closely with platform engineers to enhance self-serve tooling and streamline onboarding for new data domains.
- 5 years of experience building and operating data pipelines and distributed systems in cloud environments (AWS GCP or Azure).
- Hands-on experience implementing data mesh concepts data products domain ownership federated standards and self-service patterns.
- Strong programming skills in Python Scala or Java for developing scalable ETL/ELT and data services.
- Expert-level SQL and experience with modern data warehouses (e.g. Snowflake BigQuery Redshift).
- Proven experience with streaming and orchestration frameworks (e.g. Kafka Spark Airflow dbt).
- Practical knowledge of Kubernetes containerization and CI/CD automation for data engineering workflows.
- Experience supporting AI/ML data enablement including feature pipelines vector databases and model-serving data requirements.
- Strong understanding of data quality observability and schema versioning in distributed environments.
- Experience implementing or consuming data catalogs and governance frameworks (e.g. DataHub Amundsen Collibra).
- Familiarity with open table formats (Iceberg Delta Hudi) and lakehouse architectures.
- Experience building APIs or SDKs for data product publishing and consumption.
- Exposure to self-serve analytics tools (Looker Tableau Streamlit) and BI use cases.
- Passion for automation clean code and continuous learning in fast-moving data ecosystems.
Design build and operate scalable cloud-native data pipelines and services that deliver high-quality domain-owned data products. Implement data mesh principles by helping domains publish discover and consume data products using shared standards and infrastructure. Build and maintain real-time and ba...
Design build and operate scalable cloud-native data pipelines and services that deliver high-quality domain-owned data products. Implement data mesh principles by helping domains publish discover and consume data products using shared standards and infrastructure. Build and maintain real-time and batch pipelines using tools like Spark Kafka and Airflow ensuring reliability and performance at scale. Develop metadata lineage and catalog integrations so data products are easily discoverable and trusted across domains. Work directly with data producers and consumers to define schemas contracts and access patterns that improve interoperability. Automate testing validation and deployment through CI/CD pipelines to ensure fast consistent delivery of data products. Monitor and troubleshoot data pipelines and systems driving improvements in observability scalability and cost efficiency. Collaborate closely with platform engineers to enhance self-serve tooling and streamline onboarding for new data domains.
- 5 years of experience building and operating data pipelines and distributed systems in cloud environments (AWS GCP or Azure).
- Hands-on experience implementing data mesh concepts data products domain ownership federated standards and self-service patterns.
- Strong programming skills in Python Scala or Java for developing scalable ETL/ELT and data services.
- Expert-level SQL and experience with modern data warehouses (e.g. Snowflake BigQuery Redshift).
- Proven experience with streaming and orchestration frameworks (e.g. Kafka Spark Airflow dbt).
- Practical knowledge of Kubernetes containerization and CI/CD automation for data engineering workflows.
- Experience supporting AI/ML data enablement including feature pipelines vector databases and model-serving data requirements.
- Strong understanding of data quality observability and schema versioning in distributed environments.
- Experience implementing or consuming data catalogs and governance frameworks (e.g. DataHub Amundsen Collibra).
- Familiarity with open table formats (Iceberg Delta Hudi) and lakehouse architectures.
- Experience building APIs or SDKs for data product publishing and consumption.
- Exposure to self-serve analytics tools (Looker Tableau Streamlit) and BI use cases.
- Passion for automation clean code and continuous learning in fast-moving data ecosystems.
View more
View less