Key Responsibilities
Data Pipeline Development: Design and implement robust ETL/ELT pipelines using Databricks PySpark and Delta Lake to process structured and unstructured data efficiently.
Performance Optimization: Tune and optimize Databricks clusters and notebooks for performance scalability and cost-efficiency.
Collaboration: Work closely with data scientists analysts and business stakeholders to understand data requirements and deliver solutions that meet business needs.
Cloud Integration: Leverage cloud platforms (AWS Azure GCP) to build and deploy data solutions ensuring seamless integration with existing infrastructure.
Data Modeling: Develop and maintain data models that support analytics and machine learning workflows.
Automation & Monitoring: Implement automated testing monitoring and alerting mechanisms to ensure data pipeline reliability and data quality.
Documentation & Best Practices: Maintain comprehensive documentation of data workflows and adhere to best practices in coding version control and data governance.
Required Qualifications
Experience: 5 years in data engineering with hands-on experience using Databricks and Apache Spark.
Programming Skills: Proficiency in Python and SQL; experience with Scala is a plus.
Cloud Platforms: Strong experience with cloud services such as AWS (e.g. S3 Glue Redshift) Azure (e.g. Data Factory Synapse) or GCP.
Data Engineering Tools: Familiarity with tools like Airflow Kafka and dbt.
Data Modeling: Experience in designing data models for analytics and machine learning applications.
Collaboration: Proven ability to work in cross-functional teams and communicate effectively with non-technical stakeholders.
Primary Skill Set
Databricks Apache Spark Python SQL Scala (optional) ETL/ELT development Delta Lake Cloud platforms (AWS Azure GCP) Data modeling Cross-functional collaboration Communication
Secondary Skill Set
Airflow dbt Kafka Hadoop MLflow Unity Catalog Delta Live Tables Cluster optimization Data governance Security and compliance Databricks certifications
HR
Xlysi LLC Expert Portal Solutions251 Milwaukee Ave Buffalo grove IL 60089
Web :
E-mail: Our training portal registration:
Key ResponsibilitiesData Pipeline Development: Design and implement robust ETL/ELT pipelines using Databricks PySpark and Delta Lake to process structured and unstructured data efficiently.Performance Optimization: Tune and optimize Databricks clusters and notebooks for performance scalability and c...
Key Responsibilities
Data Pipeline Development: Design and implement robust ETL/ELT pipelines using Databricks PySpark and Delta Lake to process structured and unstructured data efficiently.
Performance Optimization: Tune and optimize Databricks clusters and notebooks for performance scalability and cost-efficiency.
Collaboration: Work closely with data scientists analysts and business stakeholders to understand data requirements and deliver solutions that meet business needs.
Cloud Integration: Leverage cloud platforms (AWS Azure GCP) to build and deploy data solutions ensuring seamless integration with existing infrastructure.
Data Modeling: Develop and maintain data models that support analytics and machine learning workflows.
Automation & Monitoring: Implement automated testing monitoring and alerting mechanisms to ensure data pipeline reliability and data quality.
Documentation & Best Practices: Maintain comprehensive documentation of data workflows and adhere to best practices in coding version control and data governance.
Required Qualifications
Experience: 5 years in data engineering with hands-on experience using Databricks and Apache Spark.
Programming Skills: Proficiency in Python and SQL; experience with Scala is a plus.
Cloud Platforms: Strong experience with cloud services such as AWS (e.g. S3 Glue Redshift) Azure (e.g. Data Factory Synapse) or GCP.
Data Engineering Tools: Familiarity with tools like Airflow Kafka and dbt.
Data Modeling: Experience in designing data models for analytics and machine learning applications.
Collaboration: Proven ability to work in cross-functional teams and communicate effectively with non-technical stakeholders.
Primary Skill Set
Databricks Apache Spark Python SQL Scala (optional) ETL/ELT development Delta Lake Cloud platforms (AWS Azure GCP) Data modeling Cross-functional collaboration Communication
Secondary Skill Set
Airflow dbt Kafka Hadoop MLflow Unity Catalog Delta Live Tables Cluster optimization Data governance Security and compliance Databricks certifications
HR
Xlysi LLC Expert Portal Solutions251 Milwaukee Ave Buffalo grove IL 60089
Web :
E-mail: Our training portal registration:
View more
View less