- Design and implement data ingestion pipelines for batch and streaming data
- Configure and maintain data orchestration workflows (Airflow NiFi) and CI/CD automation for data processes
- Design and organize data layers within Data Lake architecture (HDFS Iceberg S3)
- Build and maintain secure and governed data environments using Apache Ranger Atlas and SDX
- Develop SQL queries and optimize performance for analytical workloads in Hive/Impala
- Collaborate on data modeling for analytics and BI ensuring clean schemas and dimensional models
- Support machine learning workflows using Spark MLlib or Cloudera Machine Learning (CML)
Qualifications :
- Proven experience in building and maintaining large-scale data pipelines (batch and streaming)
- Strong knowledge of data engineering fundamentals: ETL/ELT data governance data warehousing Medallion architecture
- Strong SQL skills for Data Warehouse data serving
- Minimum 3 years of experience in Python or Scala for data processing
- Hands-on experience with Apache Spark Kafka Airflow and distributed systems optimization
- Experience with Apache Ranger and Atlas for security and metadata management
- Upper-Intermediate English proficiency
WILL BE A PLUS
- Experience with Cloudera Data Platform (CDP)
- Advanced SQL skills and Hive/Impala query optimization
- BS in Computer Science or related field
- Exposure to ML frameworks and predictive modeling
Additional Information :
PERSONAL PROFILE
- Ownership mindset and proactive approach
- Ability to drive initiatives forward and suggest improvements
- Team player with shared responsibility for delivery speed efficiency and quality
- Excellent written and verbal communication skills
Remote Work :
No
Employment Type :
Full-time
Design and implement data ingestion pipelines for batch and streaming dataConfigure and maintain data orchestration workflows (Airflow NiFi) and CI/CD automation for data processesDesign and organize data layers within Data Lake architecture (HDFS Iceberg S3)Build and maintain secure and governed da...
- Design and implement data ingestion pipelines for batch and streaming data
- Configure and maintain data orchestration workflows (Airflow NiFi) and CI/CD automation for data processes
- Design and organize data layers within Data Lake architecture (HDFS Iceberg S3)
- Build and maintain secure and governed data environments using Apache Ranger Atlas and SDX
- Develop SQL queries and optimize performance for analytical workloads in Hive/Impala
- Collaborate on data modeling for analytics and BI ensuring clean schemas and dimensional models
- Support machine learning workflows using Spark MLlib or Cloudera Machine Learning (CML)
Qualifications :
- Proven experience in building and maintaining large-scale data pipelines (batch and streaming)
- Strong knowledge of data engineering fundamentals: ETL/ELT data governance data warehousing Medallion architecture
- Strong SQL skills for Data Warehouse data serving
- Minimum 3 years of experience in Python or Scala for data processing
- Hands-on experience with Apache Spark Kafka Airflow and distributed systems optimization
- Experience with Apache Ranger and Atlas for security and metadata management
- Upper-Intermediate English proficiency
WILL BE A PLUS
- Experience with Cloudera Data Platform (CDP)
- Advanced SQL skills and Hive/Impala query optimization
- BS in Computer Science or related field
- Exposure to ML frameworks and predictive modeling
Additional Information :
PERSONAL PROFILE
- Ownership mindset and proactive approach
- Ability to drive initiatives forward and suggest improvements
- Team player with shared responsibility for delivery speed efficiency and quality
- Excellent written and verbal communication skills
Remote Work :
No
Employment Type :
Full-time
View more
View less