Project/Program
Identity & Access Management (IAM) Data Modernization migration of an on premises SQL data warehouse to a target state Data Lake on Google Cloud (GCP) enabling metrics & reporting advanced analytics and GenAI use cases (natural language querying accelerated summarization cross domain trend analysis).
About Program/Project
The IAM Data Modernization project involves migrating an on-premises SQL data warehouse to a target state Data Lake in GCP cloud environment. Key highlights include:
- Integration Scope: 30 source system data ingestions and multiple downstream integrations
- Capabilities: Metrics reporting and Gen AI use cases with natural language querying advanced pattern/trend analysis faster summarizations and cross-domain metric monitoring
- Benefits:
- Scalability and access to advanced cloud functionality
- Highly available and performant semantic layer with historical data support
- Unified data strategy for executive reporting analytics and Gen AI across cyber domains
This modernization establishes a single source of truth for enterprise-wide data-driven decision-making.
Required Skills
Data Lake Architecture & Storage
- Proven experience designing and implementing data lake architectures (e.g. Bronze/Silver/Gold or layered models).
- Strong knowledge of Cloud Storage (GCS) design including bucket layout naming conventions lifecycle policies and access controls
Experience with Hadoop/HDFS architecture distributed file systems and data locality principles
- Hands-on experience with columnar data formats (Parquet Avro ORC) and compression techniques
- Expertise in partitioning strategies backfills and large-scale data organization
- Ability to design data models optimized for analytics and BI consumption
Qualifications
- Experience: 10 14 years in data engineering/architecture 5 years designing on GCP at scale; prior on prem cloud migration a must.
- Education: Bachelors/Masters in Computer Science Information Systems or equivalent experience.
- Certifications: Google Cloud Professional Cloud Architect (required or within 3 months). Plus: Professional Data Engineer Security Engineer.
Data Ingestion & Orchestration
Experience building batch and streaming ingestion pipelines using GCP-native services
Knowledge of Pub/Sub-based streaming architectures event schema design and versioning
Strong understanding of incremental ingestion and CDC patterns including idempotency and deduplication
Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow)
Ability to design robust error handling replay and backfill mechanisms
Data Processing & Transformation
Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc)
Strong proficiency in BigQuery SQL including query optimization partitioning clustering and cost control.
Hands-on experience with Hadoop MapReduce and ecosystem tools (Hive Pig Sqoop)
Advanced Python programming skills for data engineering including testing and maintainable code design
Experience managing schema evolution while minimizing downstream impact
Analytics & Data Serving
Expertise in BigQuery performance optimization and data serving patterns
Experience building semantic layers and governed metrics for consistent analytics
Familiarity with BI integration access controls and dashboard standards
Understanding of data exposure patterns via views APIs or curated datasets
Data Governance Quality & Metadata
Experience implementing data catalogs metadata management and ownership models
Understanding of data lineage for auditability and troubleshooting
Strong focus on data quality frameworks including validation freshness checks and alerting
Experience defining and enforcing data contracts schemas and SLAs
Familiarity with audit logging and compliance readiness
Cloud Platform Management
Strong hands-on experience with Google Cloud Platform (GCP) including project setup environment separation billing quotas and cost controls
Expertise in IAM and security best practices including least-privilege access service accounts and role-based access
Solid understanding of VPC networking private access patterns and secure service connectivity
Experience with encryption and key management (KMS CMEK) and security auditing
DevOps Platform & Reliability
Proven ability to build CI/CD pipelines for data and infrastructure workloads
Experience managing secrets securely using GCP Secret Manager
Ownership of observability SLOs dashboards alerts and runbooks
Proficiency in logging monitoring and alerting for data pipelines and platform reliability
Good to have
Security Privacy & Compliance
Hands-on experience implementing fine-grained access controls for BigQuery and GCS
Experience with VPC Service Controls and data exfiltration prevention
Knowledge of
PII handling data masking tokenization and audit requirements
Project/Program Identity & Access Management (IAM) Data Modernization migration of an on premises SQL data warehouse to a target state Data Lake on Google Cloud (GCP) enabling metrics & reporting advanced analytics and GenAI use cases (natural language querying accelerated summarization cros...
Project/Program
Identity & Access Management (IAM) Data Modernization migration of an on premises SQL data warehouse to a target state Data Lake on Google Cloud (GCP) enabling metrics & reporting advanced analytics and GenAI use cases (natural language querying accelerated summarization cross domain trend analysis).
About Program/Project
The IAM Data Modernization project involves migrating an on-premises SQL data warehouse to a target state Data Lake in GCP cloud environment. Key highlights include:
- Integration Scope: 30 source system data ingestions and multiple downstream integrations
- Capabilities: Metrics reporting and Gen AI use cases with natural language querying advanced pattern/trend analysis faster summarizations and cross-domain metric monitoring
- Benefits:
- Scalability and access to advanced cloud functionality
- Highly available and performant semantic layer with historical data support
- Unified data strategy for executive reporting analytics and Gen AI across cyber domains
This modernization establishes a single source of truth for enterprise-wide data-driven decision-making.
Required Skills
Data Lake Architecture & Storage
- Proven experience designing and implementing data lake architectures (e.g. Bronze/Silver/Gold or layered models).
- Strong knowledge of Cloud Storage (GCS) design including bucket layout naming conventions lifecycle policies and access controls
Experience with Hadoop/HDFS architecture distributed file systems and data locality principles
- Hands-on experience with columnar data formats (Parquet Avro ORC) and compression techniques
- Expertise in partitioning strategies backfills and large-scale data organization
- Ability to design data models optimized for analytics and BI consumption
Qualifications
- Experience: 10 14 years in data engineering/architecture 5 years designing on GCP at scale; prior on prem cloud migration a must.
- Education: Bachelors/Masters in Computer Science Information Systems or equivalent experience.
- Certifications: Google Cloud Professional Cloud Architect (required or within 3 months). Plus: Professional Data Engineer Security Engineer.
Data Ingestion & Orchestration
Experience building batch and streaming ingestion pipelines using GCP-native services
Knowledge of Pub/Sub-based streaming architectures event schema design and versioning
Strong understanding of incremental ingestion and CDC patterns including idempotency and deduplication
Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow)
Ability to design robust error handling replay and backfill mechanisms
Data Processing & Transformation
Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc)
Strong proficiency in BigQuery SQL including query optimization partitioning clustering and cost control.
Hands-on experience with Hadoop MapReduce and ecosystem tools (Hive Pig Sqoop)
Advanced Python programming skills for data engineering including testing and maintainable code design
Experience managing schema evolution while minimizing downstream impact
Analytics & Data Serving
Expertise in BigQuery performance optimization and data serving patterns
Experience building semantic layers and governed metrics for consistent analytics
Familiarity with BI integration access controls and dashboard standards
Understanding of data exposure patterns via views APIs or curated datasets
Data Governance Quality & Metadata
Experience implementing data catalogs metadata management and ownership models
Understanding of data lineage for auditability and troubleshooting
Strong focus on data quality frameworks including validation freshness checks and alerting
Experience defining and enforcing data contracts schemas and SLAs
Familiarity with audit logging and compliance readiness
Cloud Platform Management
Strong hands-on experience with Google Cloud Platform (GCP) including project setup environment separation billing quotas and cost controls
Expertise in IAM and security best practices including least-privilege access service accounts and role-based access
Solid understanding of VPC networking private access patterns and secure service connectivity
Experience with encryption and key management (KMS CMEK) and security auditing
DevOps Platform & Reliability
Proven ability to build CI/CD pipelines for data and infrastructure workloads
Experience managing secrets securely using GCP Secret Manager
Ownership of observability SLOs dashboards alerts and runbooks
Proficiency in logging monitoring and alerting for data pipelines and platform reliability
Good to have
Security Privacy & Compliance
Hands-on experience implementing fine-grained access controls for BigQuery and GCS
Experience with VPC Service Controls and data exfiltration prevention
Knowledge of
PII handling data masking tokenization and audit requirements
View more
View less