About Us
Job Summary:
We are seeking a Senior Data Engineer Databricks with a strong development background in Azure Databricks and Python who will be instrumental in building and optimising scalable data pipelines and solutions across the Azure ecosystem. This role requires hands-on development experience with PySpark data modelling and Azure Data Factory. You will collaborate closely with data architects analysts and business stakeholders to ensure reliable and high-performance data solutions.
Experience Required: 4 Years
Senior Data Engineer (Microsoft Azure Databricks Data Factory Data Engineer Data Modelling)
Key Responsibilities:
- Develop and Maintain Data Pipelines:
Design implement and optimise scalable data pipelines using Azure Databricks (PySpark) for both batch and streaming use cases. - Azure Platform Integration:
Work extensively with Azure services including Data Factory ADLS Gen2 Delta Lake and Azure Synapse for end-to-end data pipeline orchestration and storage. - Data Transformation & Processing:
Write efficient maintainable and reusable PySpark code for data ingestion transformation and validation processes within the Databricks environment. - Collaboration:
Partner with data architects analysts and data scientists to understand requirements and deliver robust high-quality data solutions. - Performance Tuning and Optimisation:
Optimise Databricks cluster configurations notebook performance and resource consumption to ensure cost-effective and efficient data processing. - Testing and Documentation:
Implement unit and integration tests for data pipelines. Document solutions processes and best practices to enable team growth and maintainability. - Security and Compliance:
Ensure data governance privacy and compliance are upheld across all engineered solutions following Azure security best practices.
Preferred Skills :
- Strong hands-on experience with Delta Lake including table management schema evolution and implementing ACID-compliant pipelines.
- Skilled in developing and maintaining Databricks notebooks and jobs for large-scale batch and streaming data processing.
- Experience writing modular production-grade PySpark and Python code including reusable functions and libraries for data transformation.
- Experience in streaming data ingestion and Structured Streaming in Databricks for near real-time data solutions.
- Knowledge of performance tuning techniques in Spark including job optimization caching and partitioning strategies.
- Exposure to data quality frameworks and testing practices (e.g. pytest data validation libraries custom assertions).
- Basic understanding of Unity Catalog for managing data governance access controls and lineage tracking from a developers perspective.
- Familiarity with Power BI - able to structure data models and views in Databricks or Synapse to support BI consumption.
Required Experience:
Senior IC