drjobs Intermediate Senior Data Engineer (Databricks)

Intermediate Senior Data Engineer (Databricks)

Employer Active

1 Vacancy
drjobs

Job Alert

You will be updated with latest job alerts via email
Valid email field required
Send jobs
Send me jobs like this
drjobs

Job Alert

You will be updated with latest job alerts via email

Valid email field required
Send jobs
Job Location drjobs

Centurion - South Africa

Monthly Salary drjobs

Not Disclosed

drjobs

Salary Not Disclosed

Vacancy

1 Vacancy

Job Description

We are looking for a Data Engineer who is certified in Databricks (required) to join our this role you will be designing developing and optimizing scalable data pipelines and workflows on Databricks. The engineer will work closely with stakeholders to make certain data reliability performance and alignment with business requirements.

Scope of Work

Data Pipeline Development:

  • Building efficient ETL/ELT pipelines using Databricks and Delta Lake for structured semi-structured and unstructured data.
  • Transforming raw data into consumable datasets for analytics and machine learning.

Data Optimization:

  • Improving performance by implementing best practices like partitioning caching and Delta Lake optimizations.
  • Resolving bottlenecks and ensuring scalability.

Data Integration:

  • Integrating data from various sources such as APIs databases and cloud storage systems (e.g. AWS S3 Azure Data Lake).

Real-Time Streaming:

  • Designing and deploying real-time data streaming solutions using Databricks Structured Streaming.

Data Quality and Governance:

  • Implementing data validation schema enforcement and monitoring to ensure high-quality data delivery.
  • Using Unity CatLog to manage metadata access permissions and data lineage.

Collaboration and Documentation:

  • Collaborating with data analysts data scientists and other stakeholders to meet business needs.
  • Documenting pipelines workflows and technical solutions.

Responsibilities

Fully functional and documented data pipelines.

Optimized and scalable data workflows on Databricks.

Real-time streaming solutions integrated with downstream systems.

Detailed documentation for implemented solutions and best practices.

Skills and Qualifications

Proficiency in Databricks(certified) Spark and Delta Lake.

Strong experience with Python SQL and ETL/ELT development.

Familiarity with real-time data processing and streaming.

Knowledge of cloud platforms (e.g. AWS Azure GCP).

Experience with data governance and tools like Unity CatLog.

Assumptions

Access to necessary datasets and cloud infrastructure will be provided.

Timely input and feedback from stakeholders.

Success Metrics

Data pipelines deliver accurate and consistent data.

Workflows meet performance benchmarks.

Real-time streaming solutions operate with minimal latency.

Stakeholders are satisfied with the quality and usability of the solutions.

Employment Type

CONTRACT

Company Industry

About Company

Report This Job
Disclaimer: Drjobpro.com is only a platform that connects job seekers and employers. Applicants are advised to conduct their own independent research into the credentials of the prospective employer.We always make certain that our clients do not endorse any request for money payments, thus we advise against sharing any personal or bank-related information with any third party. If you suspect fraud or malpractice, please contact us via contact us page.