Job Description Advanced working SQL knowledge and experience working with relational databases query authoring (SQL) as well as working familiarity with a variety of databases.
Extensive Experience on BigQuery DataProc and DataFlow platforms on Google Cloud platform. Having experience on Azure Databricks is an added advantage (not mandatory).
Experience on Cluster capacity configurations and cloud optimization to meet application demand.
Programming experience on Python Shell scripting PySpark and other data programming language.
Programming experience on Apache Beam Java SDK for building effective heavy data piplines and deploying them in GCP DataFlow. CICD process to deploy these pipelines in GCP.
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Strong analytic skills related to working with Data Visualization Dashboard Metrics and etc.
Build processes supporting data transformation data structures metadata dependency and workload management.
A successful history of manipulating processing and extracting value from large disconnected datasets.
Working knowledge of message queuing stream processing and highly scalable big data data stores.
Familiar with Deployment tool like Docker and building CI/CD pipelines.
Experience supporting and working with crossfunctional teams in a dynamic environment.
8 years experience in software development Data engineering and
Bachelors degree in computer science Statistics Informatics Information Systems or another quantitative field. Postgraduate/masters degree is preferred.
Experience in Machine Learning and Data Modeling is a plus.
The ideal resource would be local to the Dallas TX area so they could be in office 12 days per week. They would have extensive experience on BigQuery DataProc and DataFlow platforms on Google Cloud platform. Having experience on Azure Databricks is an added advantage (not mandatory). Programming experience on Python Shell scripting PySpark and other data programming language. Programming experience on Apache Beam Java SDK for building effective heavy data pipelines and deploying them in GCP DataFlow. CICD process to deploy these pipelines in GCP.
gcp,cloud,azure