Role : Databricks Engineer
Work Location: Remote/CA
Project Duration: 1 year
Engagement Type: remote but candidate should be from PST
Primary Skill Set:
Databricks Apache Spark Python SQL Scala (optional) ETL/ELT development Delta Lake Cloud platforms (AWS Azure GCP) Data modeling Cross-functional collaboration Communication
Secondary Skill Set
Airflow dbt Kafka Hadoop MLflow Unity Catalog Delta Live Tables Cluster optimization Data governance Security and compliance Databricks certifications
Required Qualifications:
Experience: 5 years in data engineering with hands-on experience using Databricks and Apache Spark.
Programming Skills: Proficiency in Python and SQL; experience with Scala is a plus.
Cloud Platforms: Strong experience with cloud services such as AWS (e.g. S3 Glue Redshift) Azure (e.g. Data Factory Synapse) or GCP.
Data Engineering Tools: Familiarity with tools like Airflow Kafka and dbt.
Data Modeling: Experience in designing data models for analytics and machine learning applications.
Collaboration: Proven ability to work in cross-functional teams and communicate effectively with non-technical stakeholders.
Key Responsibilities:
Data Pipeline Development: Design and implement robust ETL/ELT pipelines using Databricks PySpark and Delta Lake to process structured and unstructured data efficiently.
Performance Optimization: Tune and optimize Databricks clusters and notebooks for performance scalability and cost-efficiency.
Collaboration: Work closely with data scientists analysts and business stakeholders to understand data requirements and deliver solutions that meet business needs.
Cloud Integration: Leverage cloud platforms (AWS Azure GCP) to build and deploy data solutions ensuring seamless integration with existing infrastructure.
Data Modeling: Develop and maintain data models that support analytics and machine learning workflows.
Automation & Monitoring: Implement automated testing monitoring and alerting mechanisms to ensure data pipeline reliability and data quality.
Documentation & Best Practices: Maintain comprehensive documentation of data workflows and adhere to best practices in coding version control and data governance.
Role : Databricks Engineer Work Location: Remote/CA Project Duration: 1 year Engagement Type: remote but candidate should be from PST Primary Skill Set: Databricks Apache Spark Python SQL Scala (optional) ETL/ELT development Delta Lake Cloud platforms (AWS Azure GCP) Data modeling Cross-fu...
Role : Databricks Engineer
Work Location: Remote/CA
Project Duration: 1 year
Engagement Type: remote but candidate should be from PST
Primary Skill Set:
Databricks Apache Spark Python SQL Scala (optional) ETL/ELT development Delta Lake Cloud platforms (AWS Azure GCP) Data modeling Cross-functional collaboration Communication
Secondary Skill Set
Airflow dbt Kafka Hadoop MLflow Unity Catalog Delta Live Tables Cluster optimization Data governance Security and compliance Databricks certifications
Required Qualifications:
Experience: 5 years in data engineering with hands-on experience using Databricks and Apache Spark.
Programming Skills: Proficiency in Python and SQL; experience with Scala is a plus.
Cloud Platforms: Strong experience with cloud services such as AWS (e.g. S3 Glue Redshift) Azure (e.g. Data Factory Synapse) or GCP.
Data Engineering Tools: Familiarity with tools like Airflow Kafka and dbt.
Data Modeling: Experience in designing data models for analytics and machine learning applications.
Collaboration: Proven ability to work in cross-functional teams and communicate effectively with non-technical stakeholders.
Key Responsibilities:
Data Pipeline Development: Design and implement robust ETL/ELT pipelines using Databricks PySpark and Delta Lake to process structured and unstructured data efficiently.
Performance Optimization: Tune and optimize Databricks clusters and notebooks for performance scalability and cost-efficiency.
Collaboration: Work closely with data scientists analysts and business stakeholders to understand data requirements and deliver solutions that meet business needs.
Cloud Integration: Leverage cloud platforms (AWS Azure GCP) to build and deploy data solutions ensuring seamless integration with existing infrastructure.
Data Modeling: Develop and maintain data models that support analytics and machine learning workflows.
Automation & Monitoring: Implement automated testing monitoring and alerting mechanisms to ensure data pipeline reliability and data quality.
Documentation & Best Practices: Maintain comprehensive documentation of data workflows and adhere to best practices in coding version control and data governance.
View more
View less