Employer Active
Job Alert
You will be updated with latest job alerts via emailJob Alert
You will be updated with latest job alerts via emailWe are seeking a Data Engineer to design develop and optimize data pipelines storage solutions and processing frameworks. The ideal candidate will have expertise in ETL workflows big data processing cloud platforms and API integrations to ensure efficient data handling and highperformance analytics.
This role requires a strong foundation in data architecture security and performance optimization supporting business intelligence machine learning and operational data needs.
Design and develop scalable data pipelines using Apache NiFi Apache Airflow and PySpark.
Work with Apache Hudi for incremental and realtime data processing within a data lake.
Implement batch and realtime data processing solutions using Apache Flink and Apache Spark.
Optimize data querying and federation using Trino (Presto) and PostgreSQL.
Ensure data security governance and access control using RBAC and cloudnative security best practices.
Automate and monitor data pipeline performance latency and failures.
Collaborate with data scientists AI/ML engineers and backend teams to optimize data availability and insights.
Implement observability and monitoring using Prometheus OpenTelemetry or similar tools.
Support cloudbased data lake solutions (AWS GCP or Azure) with best practices for storage partitioning and indexing.
34 years of experience in Data Engineering Big Data or Cloud Data Solutions.
Strong programming skills in Python and SQL for data transformation and automation.
Experience with Apache NiFi for data ingestion and orchestration.
Handson expertise with Apache Spark (PySpark) and Apache Flink for largescale data processing.
Knowledge of Trino (Presto) for federated querying and PostgreSQL for analytical workloads.
Experience with Apache Hudi for data lake versioning and incremental updates.
Proficiency in Apache Airflow for workflow automation and job scheduling.
Understanding of data governance access control (RBAC) and security best practices.
Experience with observability and monitoring tools such as Prometheus OpenTelemetry or equivalent.
Experience working with realtime data streaming frameworks (Kafka Pulsar or similar).
Exposure to cloud data lake services (AWS S3 Azure Data Lake Google Cloud Storage).
Familiarity with Infrastructure as Code (Terraform CloudFormation or similar) for provisioning data lake resources.
Knowledge of containerized environments (Docker Kubernetes).
Full-Time