GCP Data Engineer

Hyderabad - India

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Key Responsibilities:

Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)
Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud Storage
Develop streaming ingestion patterns using Dataflow for real-time data capture with minimal transformation
Implement batch loading processes using Dataproc for large-volume data from diverse sources (logs databases APIs)
Apply schema inference and basic data type adjustments while preserving raw data lineage
Design partitioning strategies in Cloud Storage for efficient historical data archival and retrieval
Establish data landing zone controls including audit logging versioning and immutability patterns
Create automated workflows using Cloud Composer for orchestrating ingestion pipelines
Implement data catalog and metadata management for raw data assets

Required Skills:

5 years of experience with GCP services (Cloud Storage Pub/Sub Dataflow Dataproc Cloud Composer)
Strong expertise in Apache Kafka Kafka Streams and event-driven architectures
Proficiency in Python and/or Java for data pipeline development using Apache Beam SDK
Experience with healthcare data standards (HL7 FHIR) and handling semi-structured data
Hands-on experience with streaming frameworks (Apache Beam Dataflow) for near-real-time ingestion
Knowledge of file formats and compression (JSON Avro Parquet) for raw data storage
Understanding of CDC patterns incremental loading and data versioning strategies
Experience with Cloud Storage lifecycle management and cost optimization

Preferred Qualifications:

GCP Professional Data Engineer certification
Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
Familiarity with healthcare compliance requirements (HIPAA) and data residency
Background in log aggregation platforms (Fluentd Logstash) and observability
Knowledge of data lake security patterns and IAM controls

Qualifications :

Preferred Qualifications:

GCP Professional Data Engineer certification
Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
Familiarity with healthcare compliance requirements (HIPAA) and data residency
Background in log aggregation platforms (Fluentd Logstash) and observability
Knowledge of data lake security patterns and IAM controls

Additional Information :

Remote Work :

Yes

Employment Type :

Full-time

Key Responsibilities:Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud StorageDevelop streaming ingestion patterns using Dataf...

Key Responsibilities:

Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)
Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud Storage
Develop streaming ingestion patterns using Dataflow for real-time data capture with minimal transformation
Implement batch loading processes using Dataproc for large-volume data from diverse sources (logs databases APIs)
Apply schema inference and basic data type adjustments while preserving raw data lineage
Design partitioning strategies in Cloud Storage for efficient historical data archival and retrieval
Establish data landing zone controls including audit logging versioning and immutability patterns
Create automated workflows using Cloud Composer for orchestrating ingestion pipelines
Implement data catalog and metadata management for raw data assets

Required Skills:

5 years of experience with GCP services (Cloud Storage Pub/Sub Dataflow Dataproc Cloud Composer)
Strong expertise in Apache Kafka Kafka Streams and event-driven architectures
Proficiency in Python and/or Java for data pipeline development using Apache Beam SDK
Experience with healthcare data standards (HL7 FHIR) and handling semi-structured data
Hands-on experience with streaming frameworks (Apache Beam Dataflow) for near-real-time ingestion
Knowledge of file formats and compression (JSON Avro Parquet) for raw data storage
Understanding of CDC patterns incremental loading and data versioning strategies
Experience with Cloud Storage lifecycle management and cost optimization

Preferred Qualifications:

GCP Professional Data Engineer certification
Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
Familiarity with healthcare compliance requirements (HIPAA) and data residency
Background in log aggregation platforms (Fluentd Logstash) and observability
Knowledge of data lake security patterns and IAM controls

Qualifications :

Preferred Qualifications:

GCP Professional Data Engineer certification
Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
Familiarity with healthcare compliance requirements (HIPAA) and data residency
Background in log aggregation platforms (Fluentd Logstash) and observability
Knowledge of data lake security patterns and IAM controls

Additional Information :

Remote Work :

Yes

Employment Type :

Full-time

Key Skills

Apache Hive
S3
Hadoop
Redshift
Spark
AWS
Apache Pig
NoSQL
Big Data
Data Warehouse
Kafka
Scala

Apply Now

About Company

Sutherland

Sutherland is seeking an organized and reliable person to join us as Admin Specialist. We are a group of driven and supportive individuals. If you are looking to build a fulfilling career and are confident you have the skills and experience to help us succeed, we want to work with you ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click