Job Description
Key Responsibilities
Design develop and optimizeETL pipelinesusingPySparkonGoogle Cloud Platform (GCP).
Work withBigQueryCloud DataflowCloud Composer (Apache Airflow) andCloud Storagefor data transformation and orchestration.
Develop and optimizeSpark-basedETL processes forlarge-scale data processing.
Implementbest practices for data governance security and monitoringin a cloud environment.
Collaborate withdata engineers analysts and business stakeholdersto understand data requirements.
Troubleshootperformance bottlenecksandoptimize Spark jobsfor efficient execution.
Automatedata workflowsusingApache AirfloworCloud Composer.
Ensuredata quality validation and consistencyacross pipelines.
5 yearsof experience inETL developmentwith a focus onPySpark.
Strong hands-on experience withGoogle Cloud Platform (GCP)services including:
BigQuery
Cloud Dataflow / Apache Beam
Cloud Composer (Apache Airflow)
Cloud Storage
Proficiency inPythonandPySparkforbig data processing.
Experience withdata lake architecturesanddata warehousingconcepts.
Knowledge ofSQLfor data querying and transformation.
Experience withCI/CD pipelinesfor data pipeline automation.
Strong debugging and problem-solving skills.
Experience withKafkaorPub/Subfor real-time data processing.
Knowledge ofTerraformfor infrastructure automation on GCP.
Experience withcontainerization (Docker Kubernetes).
Familiarity withDevOpsandmonitoring toolslike Prometheus Stackdriver or Datadog.
Skills:GcpPysparkEtl
Required Skills:
WORKFLOWSBIGQUERYKUBERNETESCLOUD STORAGEDOCKERAPACHE AIRFLOW
Job DescriptionKey ResponsibilitiesDesign develop and optimizeETL pipelinesusingPySparkonGoogle Cloud Platform (GCP).Work withBigQueryCloud DataflowCloud Composer (Apache Airflow) andCloud Storagefor data transformation and orchestration.Develop and optimizeSpark-basedETL processes forlarge-scale da...
Job Description
Key Responsibilities
Design develop and optimizeETL pipelinesusingPySparkonGoogle Cloud Platform (GCP).
Work withBigQueryCloud DataflowCloud Composer (Apache Airflow) andCloud Storagefor data transformation and orchestration.
Develop and optimizeSpark-basedETL processes forlarge-scale data processing.
Implementbest practices for data governance security and monitoringin a cloud environment.
Collaborate withdata engineers analysts and business stakeholdersto understand data requirements.
Troubleshootperformance bottlenecksandoptimize Spark jobsfor efficient execution.
Automatedata workflowsusingApache AirfloworCloud Composer.
Ensuredata quality validation and consistencyacross pipelines.
5 yearsof experience inETL developmentwith a focus onPySpark.
Strong hands-on experience withGoogle Cloud Platform (GCP)services including:
BigQuery
Cloud Dataflow / Apache Beam
Cloud Composer (Apache Airflow)
Cloud Storage
Proficiency inPythonandPySparkforbig data processing.
Experience withdata lake architecturesanddata warehousingconcepts.
Knowledge ofSQLfor data querying and transformation.
Experience withCI/CD pipelinesfor data pipeline automation.
Strong debugging and problem-solving skills.
Experience withKafkaorPub/Subfor real-time data processing.
Knowledge ofTerraformfor infrastructure automation on GCP.
Experience withcontainerization (Docker Kubernetes).
Familiarity withDevOpsandmonitoring toolslike Prometheus Stackdriver or Datadog.
Skills:GcpPysparkEtl
Required Skills:
WORKFLOWSBIGQUERYKUBERNETESCLOUD STORAGEDOCKERAPACHE AIRFLOW
View more
View less