Role Summary
We are looking for a Data Engineer with strong expertise in the Hadoop ecosystem ETL development and data transformation logic focused on modernizing IAM data flows. This role involves terminating legacy batch SQL jobs re-pointing feeds via NDM and pushing IAM data into a Cyber Data Lake built on Hadoop. The engineer will design and implement push-based near real-time ingestion pipelines with transformation logic applied during ingestion enabling scalable secure and audit-ready IAM datasets.
Key Responsibilities
- Modernization & Migration
- Decommission existing batch SQL jobs and migrate to modern ingestion architecture.
- Re-point upstream and downstream feeds using NDM for secure data transfers.
- Onboard IAM datasets into a Cyber Data Lake (Hadoop) with optimized storage formats (Parquet/ORC) and partitioning.
- Pipeline Development & Transformation
- Build ETL/ELT pipelines using Spark/Hive to perform transformations during ingestion (schema mapping normalization deduplication).
- Implement push-based near real-time ingestion (event-driven or micro-batch) instead of scheduled pulls.
- Apply complex IAM-specific transformation logic for identities accounts (human & non-human) roles entitlements and policies.
- Data Quality & Observability
- Define and automate data quality checks (completeness accuracy referential integrity).
- Implement monitoring logging and alerting for ingestion pipelines and NDM transfers.
- Performance & Optimization
- Tune Spark jobs Hive queries and storage strategies for scale and cost efficiency.
- Optimize resource allocation and implement backpressure controls for streaming ingestion.
- Enforce least privilege and secure handling of sensitive IAM attributes (PII).
- Maintain metadata lineage and data dictionaries; ensure compliance with audit requirements.
- Work onsite with client IAM teams application owners and auditors to clarify requirements and deliver modernization milestones.
- Maintain detailed documentation (ERDs flow diagrams runbooks).
Required Qualifications
- 5 8 years of experience in Data Engineering with exposure to IAM data and modernization projects.
- Strong hands-on experience with Hadoop ecosystem: HDFS Hive Spark (SQL/Scala/PySpark).
- Proven experience in ETL/ELT design data transformation logic and pipeline optimization.
- Experience terminating legacy batch SQL jobs and migrating to modern ingestion patterns.
- Practical knowledge of NDM for secure data transfers.
- Expertise in push-based ingestion and near real-time data processing.
- Understanding of IAM concepts: identities service/non-human accounts roles entitlements policies.
Role Summary We are looking for a Data Engineer with strong expertise in the Hadoop ecosystem ETL development and data transformation logic focused on modernizing IAM data flows. This role involves terminating legacy batch SQL jobs re-pointing feeds via NDM and pushing IAM data into a Cyber ...
Role Summary
We are looking for a Data Engineer with strong expertise in the Hadoop ecosystem ETL development and data transformation logic focused on modernizing IAM data flows. This role involves terminating legacy batch SQL jobs re-pointing feeds via NDM and pushing IAM data into a Cyber Data Lake built on Hadoop. The engineer will design and implement push-based near real-time ingestion pipelines with transformation logic applied during ingestion enabling scalable secure and audit-ready IAM datasets.
Key Responsibilities
- Modernization & Migration
- Decommission existing batch SQL jobs and migrate to modern ingestion architecture.
- Re-point upstream and downstream feeds using NDM for secure data transfers.
- Onboard IAM datasets into a Cyber Data Lake (Hadoop) with optimized storage formats (Parquet/ORC) and partitioning.
- Pipeline Development & Transformation
- Build ETL/ELT pipelines using Spark/Hive to perform transformations during ingestion (schema mapping normalization deduplication).
- Implement push-based near real-time ingestion (event-driven or micro-batch) instead of scheduled pulls.
- Apply complex IAM-specific transformation logic for identities accounts (human & non-human) roles entitlements and policies.
- Data Quality & Observability
- Define and automate data quality checks (completeness accuracy referential integrity).
- Implement monitoring logging and alerting for ingestion pipelines and NDM transfers.
- Performance & Optimization
- Tune Spark jobs Hive queries and storage strategies for scale and cost efficiency.
- Optimize resource allocation and implement backpressure controls for streaming ingestion.
- Enforce least privilege and secure handling of sensitive IAM attributes (PII).
- Maintain metadata lineage and data dictionaries; ensure compliance with audit requirements.
- Work onsite with client IAM teams application owners and auditors to clarify requirements and deliver modernization milestones.
- Maintain detailed documentation (ERDs flow diagrams runbooks).
Required Qualifications
- 5 8 years of experience in Data Engineering with exposure to IAM data and modernization projects.
- Strong hands-on experience with Hadoop ecosystem: HDFS Hive Spark (SQL/Scala/PySpark).
- Proven experience in ETL/ELT design data transformation logic and pipeline optimization.
- Experience terminating legacy batch SQL jobs and migrating to modern ingestion patterns.
- Practical knowledge of NDM for secure data transfers.
- Expertise in push-based ingestion and near real-time data processing.
- Understanding of IAM concepts: identities service/non-human accounts roles entitlements policies.
View more
View less