Design build and maintain high performance data pipelines that power analytics and machine learning products. Collaborate with data scientists product and infrastructure teams to turn raw data into scalable reliable assets.
Key Responsibilities
- Architect end to end batch and Spark Streaming pipelines on cloud (AWS/GCP/Azure).
- Implement ML feature pipelines and Realtime inference services.
- Optimize peta byte scale processing with Spark Kafka and Flink.
- Build and maintain data warehouses/lakes (Redshift BigQuery Snowflake).
- Enforce data quality governance and security.
- Develop CI/CD monitoring and alerting for pipelines.
- Mentor engineers and drive best practice documentation.
Required Experience
- 5 years production grade data engineering.
- Deep expertise with Apache Spark (batch & streaming) Kafka and distributed processing.
- Handson ML pipeline experience (feature engineering model training deployment).
- Cloud data platforms and warehousing.
- Strong SQL Python/Scala; familiar with Airflow dbt or similar.
Design build and maintain high performance data pipelines that power analytics and machine learning products. Collaborate with data scientists product and infrastructure teams to turn raw data into scalable reliable assets. Key Responsibilities Architect end to end batch and Spark Streaming pipelin...
Design build and maintain high performance data pipelines that power analytics and machine learning products. Collaborate with data scientists product and infrastructure teams to turn raw data into scalable reliable assets.
Key Responsibilities
- Architect end to end batch and Spark Streaming pipelines on cloud (AWS/GCP/Azure).
- Implement ML feature pipelines and Realtime inference services.
- Optimize peta byte scale processing with Spark Kafka and Flink.
- Build and maintain data warehouses/lakes (Redshift BigQuery Snowflake).
- Enforce data quality governance and security.
- Develop CI/CD monitoring and alerting for pipelines.
- Mentor engineers and drive best practice documentation.
Required Experience
- 5 years production grade data engineering.
- Deep expertise with Apache Spark (batch & streaming) Kafka and distributed processing.
- Handson ML pipeline experience (feature engineering model training deployment).
- Cloud data platforms and warehousing.
- Strong SQL Python/Scala; familiar with Airflow dbt or similar.
View more
View less