Submission must have LinkedIn profile
Key Responsibilities:
Design build and maintain data pipelines across on-prem Hadoop and AWS
Develop and maintain Java applications utilities and data processing libraries
Manage and enhance internal Java libraries used for ingestion validation and transformation
Migrate and sync data from on-prem HDFS to AWS S3
Develop and maintain Airflow DAGs for orchestration and scheduling
Work with Kafka-based streaming pipelines for real-time/near-real-time ingestion
Build and optimize Spark / PySpark jobs for large-scale data processing
Use Hive Presto/Trino and Athena for querying and validation
Implement data quality checks monitoring and alerting
Support Iceberg tables and AWS external tables
Troubleshoot production issues and ensure SLA compliance
Collaborate with platform analytics and observability teams
Technical Skills Required:
Java (Development maintenance build tools like Gradle)
AWS (S3 Glue EMR Athena EKS basics)
Hadoop/HDFS Hive
Apache Kafka (producers/consumers topics streaming ingestion)
Apache Spark / PySpark (batch streaming processing)
Apache Airflow (DAG development and maintenance)
Python
Git and CI/CD workflows
Observability tools (Prometheus/Grafana)
SQL
Role Descriptions: Key Responsibilities Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive PrestoTrino and Athena for querying and validation Implement data quality checks monitoring and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform analytics and observability teams Technical Skills RequiredJava (Development maintenance build tools like Gradle) AWS (S3 Glue EMR Athena EKS basics) HadoopHDFS HiveApache Kafka (producersconsumers topics streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGrafana)SQL
Essential Skills: Key Responsibilities Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive PrestoTrino and Athena for querying and validation Implement data quality checks monitoring and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform analytics and observability teams Technical Skills RequiredJava (Development maintenance build tools like Gradle) AWS (S3 Glue EMR Athena EKS basics) HadoopHDFS HiveApache Kafka (producersconsumers topics streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGrafana)SQL
Comments for Suppliers:
Submission must have LinkedIn profile Key Responsibilities: Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and t...
Submission must have LinkedIn profile
Key Responsibilities:
Design build and maintain data pipelines across on-prem Hadoop and AWS
Develop and maintain Java applications utilities and data processing libraries
Manage and enhance internal Java libraries used for ingestion validation and transformation
Migrate and sync data from on-prem HDFS to AWS S3
Develop and maintain Airflow DAGs for orchestration and scheduling
Work with Kafka-based streaming pipelines for real-time/near-real-time ingestion
Build and optimize Spark / PySpark jobs for large-scale data processing
Use Hive Presto/Trino and Athena for querying and validation
Implement data quality checks monitoring and alerting
Support Iceberg tables and AWS external tables
Troubleshoot production issues and ensure SLA compliance
Collaborate with platform analytics and observability teams
Technical Skills Required:
Java (Development maintenance build tools like Gradle)
AWS (S3 Glue EMR Athena EKS basics)
Hadoop/HDFS Hive
Apache Kafka (producers/consumers topics streaming ingestion)
Apache Spark / PySpark (batch streaming processing)
Apache Airflow (DAG development and maintenance)
Python
Git and CI/CD workflows
Observability tools (Prometheus/Grafana)
SQL
Role Descriptions: Key Responsibilities Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive PrestoTrino and Athena for querying and validation Implement data quality checks monitoring and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform analytics and observability teams Technical Skills RequiredJava (Development maintenance build tools like Gradle) AWS (S3 Glue EMR Athena EKS basics) HadoopHDFS HiveApache Kafka (producersconsumers topics streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGrafana)SQL
Essential Skills: Key Responsibilities Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive PrestoTrino and Athena for querying and validation Implement data quality checks monitoring and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform analytics and observability teams Technical Skills RequiredJava (Development maintenance build tools like Gradle) AWS (S3 Glue EMR Athena EKS basics) HadoopHDFS HiveApache Kafka (producersconsumers topics streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGrafana)SQL
Comments for Suppliers:
View more
View less