Data Engineerjavascriptvoid(0)

Austin, TX - USA

Monthly Salary: Not Disclosed

Posted on: 21 hours ago

Vacancies: 1 Vacancy

Job Summary

Submission must have LinkedIn profile

Key Responsibilities:

Design build and maintain data pipelines across on-prem Hadoop and AWS

Develop and maintain Java applications utilities and data processing libraries

Manage and enhance internal Java libraries used for ingestion validation and transformation

Migrate and sync data from on-prem HDFS to AWS S3

Develop and maintain Airflow DAGs for orchestration and scheduling

Work with Kafka-based streaming pipelines for real-time/near-real-time ingestion

Build and optimize Spark / PySpark jobs for large-scale data processing

Use Hive Presto/Trino and Athena for querying and validation

Implement data quality checks monitoring and alerting

Support Iceberg tables and AWS external tables

Troubleshoot production issues and ensure SLA compliance

Collaborate with platform analytics and observability teams

Technical Skills Required:

Java (Development maintenance build tools like Gradle)

AWS (S3 Glue EMR Athena EKS basics)

Hadoop/HDFS Hive

Apache Kafka (producers/consumers topics streaming ingestion)

Apache Spark / PySpark (batch streaming processing)

Apache Airflow (DAG development and maintenance)

Python

Git and CI/CD workflows

Observability tools (Prometheus/Grafana)

SQL

Role Descriptions: Key Responsibilities Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive PrestoTrino and Athena for querying and validation Implement data quality checks monitoring and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform analytics and observability teams Technical Skills RequiredJava (Development maintenance build tools like Gradle) AWS (S3 Glue EMR Athena EKS basics) HadoopHDFS HiveApache Kafka (producersconsumers topics streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGrafana)SQL

Essential Skills: Key Responsibilities Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and transformation Migrate and sync data from on-prem HDFS to AWS S3 Develop and maintain Airflow DAGs for orchestration and scheduling Work with Kafka-based streaming pipelines for real-timenear-real-time ingestion Build and optimize Spark PySpark jobs for large-scale data processing Use Hive PrestoTrino and Athena for querying and validation Implement data quality checks monitoring and alerting Support Iceberg tables and AWS external tables Troubleshoot production issues and ensure SLA compliance Collaborate with platform analytics and observability teams Technical Skills RequiredJava (Development maintenance build tools like Gradle) AWS (S3 Glue EMR Athena EKS basics) HadoopHDFS HiveApache Kafka (producersconsumers topics streaming ingestion) Apache Spark PySpark (batch streaming processing) Apache Airflow (DAG development and maintenance) Python Git and CICD workflows Observability tools (PrometheusGrafana)SQL

Comments for Suppliers:

Submission must have LinkedIn profile Key Responsibilities: Design build and maintain data pipelines across on-prem Hadoop and AWS Develop and maintain Java applications utilities and data processing libraries Manage and enhance internal Java libraries used for ingestion validation and t...