Lead Data Engineer

Belay Talent Solutions

Job Location:

Johannesburg - South Africa

Monthly Salary: R 650 - 700

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Position: Lead Data Engineer

Contract Type: Fixed term / Contract

Contract Duration: Start Date: 25 May 2026 End Date: December 2026

Work Model: Hybrid (2-3 days a week)

Work Location: Sandton Johannesburg South Africa (Hybrid / Office-based as required)

Role Overview

We are seeking a Lead / Senior Data Engineer to design build and operate modern Databricks and Lakehouse data platforms that support advanced analytics AI and Generative AI use cases.

This role is a senior individual contributor position operating within product-aligned crossfunctional squads. The successful candidate will deliver high-quality governed scalable data assets consumed by analytics platforms machine learning models and Generative AI solutions including LLM- and agent-based systems.

Key Responsibilities

1. Databricks & Data Platform Engineering

Design build and operate data solutions using Databricks including:

Delta Lake
Databricks Jobs and Workflows
Unity Catalog
Notebooks and shared libraries
Develop scalable reliable Lakehouse architectures supporting analytics and AI workloads.

2. Data Enablement & Consumption

Enable data consumption for:

Generative AI use cases (e.g. Retrieval-Augmented Generation AI services agent workflows)
Analytics and reporting platforms
Downstream operational and business systems
Support feature-style and curated data access patterns required by AI and GenAI workloads.

3. Generative AI Data Enablement

Build and maintain data pipelines that feed Generative AI applications including:

Curated knowledge and reference datasets
Structured and semi-structured data sources
Metadata lineage and traceability for AI consumption
Enable common GenAI data patterns such as:
Retrieval Augmented Generation (RAG)
Contextual and prompt data preparation
Model input output and feedback data flows

4. Engineering Standards & Best Practices

Develop production-grade data pipelines using:

Python
SQL
Apache Spark
Implement automated testing CI/CD and deployment practices for data workloads.
Ensure data solutions are:
Observable
Resilient
Performant
Cost-efficient
Continuously improve data quality reliability and operational stability.

5. Collaboration & Ways of Working

Act as a senior engineer within a cross-functional product squad.
Collaborate closely with:
Product Owners
AI / Machine Learning Engineers
Analytics teams
Platform and security teams
Provide engineering input into design discussions and delivery decisions.
Support peer reviews and contribute to shared engineering standards.
Provide mentorship and technical guidance including involvement in AI Engineer development.

6. Risk Governance & Run

Ensure all data solutions comply with enterprise security risk and governance standards.
Support the operational stability of data pipelines used by analytics and AI workloads.
Participate in incident resolution and root cause analysis.
Maintain appropriate technical documentation and runbooks.

Required Background & Experience:

1015 years of industry experience in data engineering or related fields.
5 years operating as a Senior or Lead Data Engineer.
Mandatory Technical Skills (with minimum experience)
Databricks (hands-on): 2 years
Enterprise data lake / lakehouse architecture: 5 years
Python: 5 years
SQL: 5 years
Apache Spark: 5 years
Production-grade data platforms: 3 years
Enterprise or regulated environments: 5 years

Mandatory Skills Summary:

Databricks
Data lake and lakehouse architecture
Python
SQL
Apache Spark
Production-grade data platforms
Enterprise or regulated environments

Desirable / Beneficial Skills:

Experience enabling AI ML or Generative AI use cases from a data engineering perspective

Familiarity with:

RAG data patterns
Feature-style or AI-serving datasets
Vector-based or embedding-ready data workflows
Experience working in Agile product-aligned squads
Exposure to cloud-native data platforms such as AWS or Azure

Desired Skills Summary:

AI ML or Generative AI
RAG data patterns
Feature-style or AI-serving datasets
Vector or embedding-ready data workflows
Cloud-native data platforms (AWS or Azure)

Position: Lead Data Engineer Contract Type: Fixed term / Contract Contract Duration: Start Date: 25 May 2026 End Date: December 2026 Work Model: Hybrid (2-3 days a week) Work Location: Sandton Johannesburg South Africa (Hybrid / Office-based as required) Role Overview We are seeking a Lead / Senior...