Job Summary:
We are seeking a highly skilled and experienced Senior Data Engineer with a strong background in Databricks SQL and Python pyspark to join our data engineering team. The ideal candidate will have a proven track record of designing building and deploying scalable data pipelines and solutions in cloud environments. You will be responsible for end-to-end development from data ingestion to deployment ensuring high performance and reliability.
Key Responsibilities:
- Design develop and maintain scalable data pipelines using Databricks and Apache Spark.
- Write efficient and optimized SQL queries for data extraction transformation and analysis.
- Develop robust data processing scripts and automation using Python Pyspark.
- Implement end-to-end data solutions including ingestion transformation storage and deployment.
- Collaborate with data scientists analysts and business stakeholders to understand data requirements.
- Optimize data workflows for performance scalability and reliability.
- Ensure data quality integrity and governance across all stages of the pipeline.
- Monitor and troubleshoot production data pipelines and deployments.
- Document technical designs processes and best practices.
Required Qualifications:
- 5 years of professional experience in data engineering or related roles.
- Strong proficiency in Databricks SQL and Python Pyspark.
- Experience with end-to-end deployment of data solutions in cloud environments (e.g. Azure AWS GCP).
- Solid understanding of ETL/ELT processes data modeling and data warehousing concepts.
- Familiarity with CI/CD pipelines version control (Git) and workflow orchestration tools (e.g. Airflow).
- Experience with structured and unstructured data formats (e.g. Parquet JSON CSV).
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
Preferred Qualifications:
- Experience with Delta Lake or other Databricks ecosystem tools.
- Knowledge of data governance security and compliance standards.
- Familiarity with containerization (Docker) and Kubernetes.
- Exposure to real-time data processing (e.g. Kafka Spark Streaming).