GCP Data Engineer

Sutherland

Not Interested
Bookmark
Report This Job

profile Job Location:

Hyderabad - India

profile Monthly Salary: Not Disclosed
Posted on: 13 hours ago
Vacancies: 1 Vacancy

Job Summary

Key Responsibilities:

  • Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)
  • Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud Storage
  • Develop streaming ingestion patterns using Dataflow for real-time data capture with minimal transformation
  • Implement batch loading processes using Dataproc for large-volume data from diverse sources (logs databases APIs)
  • Apply schema inference and basic data type adjustments while preserving raw data lineage
  • Design partitioning strategies in Cloud Storage for efficient historical data archival and retrieval
  • Establish data landing zone controls including audit logging versioning and immutability patterns
  • Create automated workflows using Cloud Composer for orchestrating ingestion pipelines
  • Implement data catalog and metadata management for raw data assets

Required Skills:

  • 5 years of experience with GCP services (Cloud Storage Pub/Sub Dataflow Dataproc Cloud Composer)
  • Strong expertise in Apache Kafka Kafka Streams and event-driven architectures
  • Proficiency in Python and/or Java for data pipeline development using Apache Beam SDK
  • Experience with healthcare data standards (HL7 FHIR) and handling semi-structured data
  • Hands-on experience with streaming frameworks (Apache Beam Dataflow) for near-real-time ingestion
  • Knowledge of file formats and compression (JSON Avro Parquet) for raw data storage
  • Understanding of CDC patterns incremental loading and data versioning strategies
  • Experience with Cloud Storage lifecycle management and cost optimization

Preferred Qualifications:

  • GCP Professional Data Engineer certification
  • Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
  • Familiarity with healthcare compliance requirements (HIPAA) and data residency
  • Background in log aggregation platforms (Fluentd Logstash) and observability
  • Knowledge of data lake security patterns and IAM controls

Qualifications :

Preferred Qualifications:

  • GCP Professional Data Engineer certification
  • Experience with Confluent Platform or Google Cloud managed Kafka (if applicable)
  • Familiarity with healthcare compliance requirements (HIPAA) and data residency
  • Background in log aggregation platforms (Fluentd Logstash) and observability
  • Knowledge of data lake security patterns and IAM controls

Additional Information :

 

 


Remote Work :

Yes


Employment Type :

Full-time

Key Responsibilities:Design and implement real-time data ingestion pipelines using Pub/Sub and Kafka Streams for healthcare data formats (HL7 FHIR)Build a robust Bronze layer as the single source of truth storing raw untransformed data in Cloud StorageDevelop streaming ingestion patterns using Dataf...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Company Logo

Sutherland is seeking an organized and reliable person to join us as Admin Specialist. We are a group of driven and supportive individuals. If you are looking to build a fulfilling career and are confident you have the skills and experience to help us succeed, we want to work with you ... View more

View Profile View Profile