Data Engineer with Databricks

CX Data Labs

Not Interested
Bookmark
Report This Job

profile Job Location:

Dallas, IA - USA

profile Monthly Salary: Not Disclosed
Posted on: 30+ days ago
Vacancies: 1 Vacancy

Job Summary

Key Responsibilities:

Develop maintain and optimize scalable ETL/ELT pipelines using PySpark and Databricks.

Collaborate with cross-functional teams to design and implement data models and data integration solutions.

Create and maintain robust SQL scripts for querying transforming and analyzing data.

Work on Databricks to manage big data workloads and ensure optimal performance for large-scale datasets.

Ensure data quality integrity and governance across the organizations data assets.

Automate data workflows and deploy reliable solutions for real-time data processing.

Debug and troubleshoot performance issues with data pipelines and implement enhancements.

Stay up-to-date with emerging trends and best practices in data engineering and big data technologies.

Required Skills and Qualifications:

Educational Background:

Bachelors or Masters degree in Computer Science Information Technology Data Science or a related field.

Certifications in Databricks Azure or related technologies are a plus.

Technical Skills:

Proficiency in SQL for complex queries database design and optimization.

Strong experience with PySpark for data transformation and processing.

Hands-on experience with Databricks for building and managing big data solutions.

Familiarity with cloud platforms like AWS Azure or Google Cloud.

Knowledge of data warehousing concepts and tools (e.g. Snowflake Redshift).

Experience with data versioning and orchestration tools like Git Airflow or Dagster.

Solid understanding of Big Data ecosystems (Hadoop Hive etc.).

Preferred Qualifications:

7 years of relevant work experience in data engineering or software engineering equivalent.

3 years of experience in implementing big data processing technology: AWS / Azure / GCP Apache Spark Python.

Experience writing and optimizing SQL queries in a business environment with large-scale complex datasets.

Key Responsibilities: Develop maintain and optimize scalable ETL/ELT pipelines using PySpark and Databricks. Collaborate with cross-functional teams to design and implement data models and data integration solutions. Create and maintain robust SQL scripts for querying transforming and analyzing d...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala