Key Responsibilities:
- Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)
- Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud Storage
- Develop streaming ingestion patterns using Dataflow for real-time data capture with minimal transformation
- Implement batch loading processes using Dataproc for large-volume data from diverse sources (logs databases APIs)
- Apply schema inference and basic data type adjustments while preserving raw data lineage
- Design partitioning strategies in Cloud Storage for efficient historical data archival and retrieval
- Establish data landing zone controls including audit logging versioning and immutability patterns
- Create automated workflows using Cloud Composer for orchestrating ingestion pipelines
- Implement data catalog and metadata management for raw data assets
Required Skills:
- 5 years of experience with GCP services (Cloud Storage Pub/Sub Dataflow Dataproc Cloud Composer)
- Strong expertise in Apache Kafka Kafka Streams and event-driven architectures
- Proficiency in Python and/or Java for data pipeline development using Apache Beam SDK
- Experience with healthcare data standards (HL7 FHIR) and handling semi-structured data
- Hands-on experience with streaming frameworks (Apache Beam Dataflow) for near-real-time ingestion
- Knowledge of file formats and compression (JSON Avro Parquet) for raw data storage
- Understanding of CDC patterns incremental loading and data versioning strategies
- Experience with Cloud Storage lifecycle management and cost optimization
Preferred Qualifications:
- GCP Professional Data Engineer certification
- Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
- Familiarity with healthcare compliance requirements (HIPAA) and data residency
- Background in log aggregation platforms (Fluentd Logstash) and observability
- Knowledge of data lake security patterns and IAM controls
Qualifications :
Preferred Qualifications:
- GCP Professional Data Engineer certification
- Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
- Familiarity with healthcare compliance requirements (HIPAA) and data residency
- Background in log aggregation platforms (Fluentd Logstash) and observability
- Knowledge of data lake security patterns and IAM controls
Additional Information :
Remote Work :
Yes
Employment Type :
Full-time
Key Responsibilities:Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud StorageDevelop streaming ingestion patterns using Dataf...
Key Responsibilities:
- Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)
- Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud Storage
- Develop streaming ingestion patterns using Dataflow for real-time data capture with minimal transformation
- Implement batch loading processes using Dataproc for large-volume data from diverse sources (logs databases APIs)
- Apply schema inference and basic data type adjustments while preserving raw data lineage
- Design partitioning strategies in Cloud Storage for efficient historical data archival and retrieval
- Establish data landing zone controls including audit logging versioning and immutability patterns
- Create automated workflows using Cloud Composer for orchestrating ingestion pipelines
- Implement data catalog and metadata management for raw data assets
Required Skills:
- 5 years of experience with GCP services (Cloud Storage Pub/Sub Dataflow Dataproc Cloud Composer)
- Strong expertise in Apache Kafka Kafka Streams and event-driven architectures
- Proficiency in Python and/or Java for data pipeline development using Apache Beam SDK
- Experience with healthcare data standards (HL7 FHIR) and handling semi-structured data
- Hands-on experience with streaming frameworks (Apache Beam Dataflow) for near-real-time ingestion
- Knowledge of file formats and compression (JSON Avro Parquet) for raw data storage
- Understanding of CDC patterns incremental loading and data versioning strategies
- Experience with Cloud Storage lifecycle management and cost optimization
Preferred Qualifications:
- GCP Professional Data Engineer certification
- Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
- Familiarity with healthcare compliance requirements (HIPAA) and data residency
- Background in log aggregation platforms (Fluentd Logstash) and observability
- Knowledge of data lake security patterns and IAM controls
Qualifications :
Preferred Qualifications:
- GCP Professional Data Engineer certification
- Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
- Familiarity with healthcare compliance requirements (HIPAA) and data residency
- Background in log aggregation platforms (Fluentd Logstash) and observability
- Knowledge of data lake security patterns and IAM controls
Additional Information :
Remote Work :
Yes
Employment Type :
Full-time
View more
View less