Position: Lead Data Engineer
Contract Type: Fixed term / Contract
Contract Duration: Start Date: 25 May 2026 End Date: December 2026
Work Model: Hybrid (2-3 days a week)
Work Location: Sandton Johannesburg South Africa (Hybrid / Office-based as required)
Role Overview
We are seeking a Lead / Senior Data Engineer to design build and operate modern Databricks and Lakehouse data platforms that support advanced analytics AI and Generative AI use cases.
This role is a senior individual contributor position operating within product-aligned crossfunctional squads. The successful candidate will deliver high-quality governed scalable data assets consumed by analytics platforms machine learning models and Generative AI solutions including LLM- and agent-based systems.
Key Responsibilities
1. Databricks & Data Platform Engineering
Design build and operate data solutions using Databricks including:
- Delta Lake
- Databricks Jobs and Workflows
- Unity Catalog
- Notebooks and shared libraries
- Develop scalable reliable Lakehouse architectures supporting analytics and AI workloads.
2. Data Enablement & Consumption
Enable data consumption for:
- Generative AI use cases (e.g. Retrieval-Augmented Generation AI services agent workflows)
- Analytics and reporting platforms
- Downstream operational and business systems
- Support feature-style and curated data access patterns required by AI and GenAI workloads.
3. Generative AI Data Enablement
Build and maintain data pipelines that feed Generative AI applications including:
- Curated knowledge and reference datasets
- Structured and semi-structured data sources
- Metadata lineage and traceability for AI consumption
- Enable common GenAI data patterns such as:
- Retrieval Augmented Generation (RAG)
- Contextual and prompt data preparation
- Model input output and feedback data flows
4. Engineering Standards & Best Practices
Develop production-grade data pipelines using:
- Python
- SQL
- Apache Spark
- Implement automated testing CI/CD and deployment practices for data workloads.
- Ensure data solutions are:
- Observable
- Resilient
- Performant
- Cost-efficient
- Continuously improve data quality reliability and operational stability.
5. Collaboration & Ways of Working
- Act as a senior engineer within a cross-functional product squad.
- Collaborate closely with:
- Product Owners
- AI / Machine Learning Engineers
- Analytics teams
- Platform and security teams
- Provide engineering input into design discussions and delivery decisions.
- Support peer reviews and contribute to shared engineering standards.
- Provide mentorship and technical guidance including involvement in AI Engineer development.
6. Risk Governance & Run
- Ensure all data solutions comply with enterprise security risk and governance standards.
- Support the operational stability of data pipelines used by analytics and AI workloads.
- Participate in incident resolution and root cause analysis.
- Maintain appropriate technical documentation and runbooks.
Required Background & Experience:
- 1015 years of industry experience in data engineering or related fields.
- 5 years operating as a Senior or Lead Data Engineer.
- Mandatory Technical Skills (with minimum experience)
- Databricks (hands-on): 2 years
- Enterprise data lake / lakehouse architecture: 5 years
- Python: 5 years
- SQL: 5 years
- Apache Spark: 5 years
- Production-grade data platforms: 3 years
- Enterprise or regulated environments: 5 years
Mandatory Skills Summary:
- Databricks
- Data lake and lakehouse architecture
- Python
- SQL
- Apache Spark
- Production-grade data platforms
- Enterprise or regulated environments
Desirable / Beneficial Skills:
- Experience enabling AI ML or Generative AI use cases from a data engineering perspective
Familiarity with:
- RAG data patterns
- Feature-style or AI-serving datasets
- Vector-based or embedding-ready data workflows
- Experience working in Agile product-aligned squads
- Exposure to cloud-native data platforms such as AWS or Azure
Desired Skills Summary:
- AI ML or Generative AI
- RAG data patterns
- Feature-style or AI-serving datasets
- Vector or embedding-ready data workflows
- Cloud-native data platforms (AWS or Azure)
Position: Lead Data Engineer Contract Type: Fixed term / Contract Contract Duration: Start Date: 25 May 2026 End Date: December 2026 Work Model: Hybrid (2-3 days a week) Work Location: Sandton Johannesburg South Africa (Hybrid / Office-based as required) Role Overview We are seeking a Lead / Senior...
Position: Lead Data Engineer
Contract Type: Fixed term / Contract
Contract Duration: Start Date: 25 May 2026 End Date: December 2026
Work Model: Hybrid (2-3 days a week)
Work Location: Sandton Johannesburg South Africa (Hybrid / Office-based as required)
Role Overview
We are seeking a Lead / Senior Data Engineer to design build and operate modern Databricks and Lakehouse data platforms that support advanced analytics AI and Generative AI use cases.
This role is a senior individual contributor position operating within product-aligned crossfunctional squads. The successful candidate will deliver high-quality governed scalable data assets consumed by analytics platforms machine learning models and Generative AI solutions including LLM- and agent-based systems.
Key Responsibilities
1. Databricks & Data Platform Engineering
Design build and operate data solutions using Databricks including:
- Delta Lake
- Databricks Jobs and Workflows
- Unity Catalog
- Notebooks and shared libraries
- Develop scalable reliable Lakehouse architectures supporting analytics and AI workloads.
2. Data Enablement & Consumption
Enable data consumption for:
- Generative AI use cases (e.g. Retrieval-Augmented Generation AI services agent workflows)
- Analytics and reporting platforms
- Downstream operational and business systems
- Support feature-style and curated data access patterns required by AI and GenAI workloads.
3. Generative AI Data Enablement
Build and maintain data pipelines that feed Generative AI applications including:
- Curated knowledge and reference datasets
- Structured and semi-structured data sources
- Metadata lineage and traceability for AI consumption
- Enable common GenAI data patterns such as:
- Retrieval Augmented Generation (RAG)
- Contextual and prompt data preparation
- Model input output and feedback data flows
4. Engineering Standards & Best Practices
Develop production-grade data pipelines using:
- Python
- SQL
- Apache Spark
- Implement automated testing CI/CD and deployment practices for data workloads.
- Ensure data solutions are:
- Observable
- Resilient
- Performant
- Cost-efficient
- Continuously improve data quality reliability and operational stability.
5. Collaboration & Ways of Working
- Act as a senior engineer within a cross-functional product squad.
- Collaborate closely with:
- Product Owners
- AI / Machine Learning Engineers
- Analytics teams
- Platform and security teams
- Provide engineering input into design discussions and delivery decisions.
- Support peer reviews and contribute to shared engineering standards.
- Provide mentorship and technical guidance including involvement in AI Engineer development.
6. Risk Governance & Run
- Ensure all data solutions comply with enterprise security risk and governance standards.
- Support the operational stability of data pipelines used by analytics and AI workloads.
- Participate in incident resolution and root cause analysis.
- Maintain appropriate technical documentation and runbooks.
Required Background & Experience:
- 1015 years of industry experience in data engineering or related fields.
- 5 years operating as a Senior or Lead Data Engineer.
- Mandatory Technical Skills (with minimum experience)
- Databricks (hands-on): 2 years
- Enterprise data lake / lakehouse architecture: 5 years
- Python: 5 years
- SQL: 5 years
- Apache Spark: 5 years
- Production-grade data platforms: 3 years
- Enterprise or regulated environments: 5 years
Mandatory Skills Summary:
- Databricks
- Data lake and lakehouse architecture
- Python
- SQL
- Apache Spark
- Production-grade data platforms
- Enterprise or regulated environments
Desirable / Beneficial Skills:
- Experience enabling AI ML or Generative AI use cases from a data engineering perspective
Familiarity with:
- RAG data patterns
- Feature-style or AI-serving datasets
- Vector-based or embedding-ready data workflows
- Experience working in Agile product-aligned squads
- Exposure to cloud-native data platforms such as AWS or Azure
Desired Skills Summary:
- AI ML or Generative AI
- RAG data patterns
- Feature-style or AI-serving datasets
- Vector or embedding-ready data workflows
- Cloud-native data platforms (AWS or Azure)
View more
View less