Key Responsibilities:
- Develop and optimize data pipelines in Databricks for transforming and processing data from various sources.
- Integrate data using Unity Catalog and external data sources (data lakes APIs etc..
- Write Spark SQL and PySpark scripts for data transformations optimizations and creating views/procedures.
- Perform data analysis to identify quality issues optimize pipelines and enhance data processing for analytics.
- Collaborate on report generation and dashboard creation with frontend teams.
- Use GitLab for version control CI/CD automation and task management (Jira).
Required Skills and Qualifications:
- Masters or Bachelors degree in Data Engineering or related field.
- 3 years experience with Databricks and advanced SQL.
- Experience with ETL processes views and procedures.
- Strong experience with Databricks Spark SQL PySpark and SQL.
- Expertise in creating and optimizing views and stored procedures in Databricks.
- Experience building ETL workflows and data models.
- Knowledge of cloud platforms (AWS Azure) and version control tools (Git GitLab).
- Experience with healthcare or clinical trial data.
- Familiarity with DevOps practices.