Job Description:
Sound Concepts of Large Datawarehouse/Data Lake Concepts ETL/ELT Ab Initio Apache Spark PySpark SQL Oracle HADOOP
Advanced dimensional modeling data vault and schema design for large scale Data Warehouses and Data Lakes.
Deep expertise in ETL/ELT engineering using Ab Initio (graphs plans PDL metadata driven design) and migration of those patterns to Spark.
Hands on PySpark/Spark proficiency for batch streaming joins windowing partitioning and performance tuning on large datasets.
Strong command of Hadoop ecosystem components: HDFS Hive YARN Oozie/Airflow Ranger Atlas and security/governance frameworks.
Oracle SQL mastery including performance tuning partitioning materialized views and implementing/decoding Virtual Private Database (VPD) policies.
Data ingestion architecture using CDC Kafka file based ingestion and incremental load frameworks for high volume HR and financial data.
Data quality engineering: reconciliation frameworks validation rules audit controls lineage and automated regression testing.
Cloud and lakehouse engineering on Databricks: Delta Lake Unity Catalog cluster optimization job orchestration and CI/CD.
Metadata driven pipeline design reusable transformation frameworks and parameterized job orchestration patterns.
Performance engineering across platforms: skew mitigation partition strategy broadcast vs shuffle decisions and storage format optimization (Parquet/ORC/Delta).
Job Description: Sound Concepts of Large Datawarehouse/Data Lake Concepts ETL/ELT Ab Initio Apache Spark PySpark SQL Oracle HADOOP Advanced dimensional modeling data vault and schema design for large scale Data Warehouses and Data Lakes. Deep expertise in ETL/ELT engineering using Ab Initio (g...
Job Description:
Sound Concepts of Large Datawarehouse/Data Lake Concepts ETL/ELT Ab Initio Apache Spark PySpark SQL Oracle HADOOP
Advanced dimensional modeling data vault and schema design for large scale Data Warehouses and Data Lakes.
Deep expertise in ETL/ELT engineering using Ab Initio (graphs plans PDL metadata driven design) and migration of those patterns to Spark.
Hands on PySpark/Spark proficiency for batch streaming joins windowing partitioning and performance tuning on large datasets.
Strong command of Hadoop ecosystem components: HDFS Hive YARN Oozie/Airflow Ranger Atlas and security/governance frameworks.
Oracle SQL mastery including performance tuning partitioning materialized views and implementing/decoding Virtual Private Database (VPD) policies.
Data ingestion architecture using CDC Kafka file based ingestion and incremental load frameworks for high volume HR and financial data.
Data quality engineering: reconciliation frameworks validation rules audit controls lineage and automated regression testing.
Cloud and lakehouse engineering on Databricks: Delta Lake Unity Catalog cluster optimization job orchestration and CI/CD.
Metadata driven pipeline design reusable transformation frameworks and parameterized job orchestration patterns.
Performance engineering across platforms: skew mitigation partition strategy broadcast vs shuffle decisions and storage format optimization (Parquet/ORC/Delta).
View more
View less