Our client is a fast growing Property Tech AI company
About the role
They are seeking a versatile Data & AI Engineer to build deploy & maintain end-to-end data pipelines for downstream Gen AI applications. Youll design data models and transformations build scalable ETL/ELT workflows while learning fast and working on the AI agent space.
Key Responsibilities
Data Modeling & Pipeline development
- Automate data ingestion from diverse sources (Databases APIs files Sharepoint/ document management tools URLs). Most files are expected to be unstructured documents with different file formats tables charts process flows schedules construction layouts/drawings etc.
- Own chunking strategy embedding indexing all unstructured & structured data for efficient retrieval by downstream RAG/agent systems
- Build test and maintain robust ETL/ELT workflows using Spark (batch & streaming)
- Define and implement logical/physical data models and schemas. Develop schema mapping and data dictionary artifacts for cross-system consistency
Gen AI Integration
- Instrument data pipelines to surface real-time context into LLM prompts
- Implement prompt engineering and RAG for varied workflows within the RE/Construction industry vertical
Observability & Governance
- Implement monitoring alerting and logging (data quality latency errors)
- Apply access controls and data privacy safeguards (e.g. Unity Catalog IAM)
CI/CD & Automation
- Develop automated testing versioning and deployment (Azure DevOps GitHub Actions Prefect/Airflow)
- Maintain reproducible environments with infrastructure as code (Terraform ARM templates)
Required Skills & Experience
- 5 years in Data Engineering or similar role with at least 12-24 months of exposure to building pipelines for unstructured data extraction including document processing with OCR cloud-native solutions and chunking indexing etc. for downstream consumption by RAG/ Gen AI applications.
- Proficiency in Python dlt for ETL/ELT pipeline duckDB or equivalent tools for analytical in-process analysis dvc for managing large files efficiently.
- Solid SQL skills and experience designing and scaling relational databases. Familiarity with non-relational column based databases is preferred.
- Familiarity with Prefect is preferred or others (e.g. Azure Data Factory)
- Proficiency with the Azure ecosystem. Should have worked on Azure services in production.
- Familiarity with RAG indexing chunking and storage across file types for efficient retrieval.
- Strong Dev Ops/Git workflows and CI/CD (CircleCI / Azure DevOps)
- Experience deploying ML artifacts using MLflow Docker or Kubernetes is good to have.
Bonus skillsets:
- Experience with Computer vision based extraction or experience in building ML models for production
- Knowledge of agentic AI system design - memory tools context orchestration
- Knowledge of data governance privacy laws (GDPR) and enterprise security patterns
They are an early-stage startup so you are expected to wear many hats working with things out of your comfort zone but with real and direct impact in production.
Why our client
- Fast-growing revenue-generating proptech startup
- Flat no BS environment high autonomy for the right talent
- Steep learning opportunities in real world enterprise production use-cases
- Remote work with quarterly meet-ups
- Multi-market multi-cultural client exposure
Our client is a fast growing Property Tech AI company About the role They are seeking a versatile Data & AI Engineer to build deploy & maintain end-to-end data pipelines for downstream Gen AI applications. Youll design data models and transformations build scalable ETL/ELT workflows while learning f...
Our client is a fast growing Property Tech AI company
About the role
They are seeking a versatile Data & AI Engineer to build deploy & maintain end-to-end data pipelines for downstream Gen AI applications. Youll design data models and transformations build scalable ETL/ELT workflows while learning fast and working on the AI agent space.
Key Responsibilities
Data Modeling & Pipeline development
- Automate data ingestion from diverse sources (Databases APIs files Sharepoint/ document management tools URLs). Most files are expected to be unstructured documents with different file formats tables charts process flows schedules construction layouts/drawings etc.
- Own chunking strategy embedding indexing all unstructured & structured data for efficient retrieval by downstream RAG/agent systems
- Build test and maintain robust ETL/ELT workflows using Spark (batch & streaming)
- Define and implement logical/physical data models and schemas. Develop schema mapping and data dictionary artifacts for cross-system consistency
Gen AI Integration
- Instrument data pipelines to surface real-time context into LLM prompts
- Implement prompt engineering and RAG for varied workflows within the RE/Construction industry vertical
Observability & Governance
- Implement monitoring alerting and logging (data quality latency errors)
- Apply access controls and data privacy safeguards (e.g. Unity Catalog IAM)
CI/CD & Automation
- Develop automated testing versioning and deployment (Azure DevOps GitHub Actions Prefect/Airflow)
- Maintain reproducible environments with infrastructure as code (Terraform ARM templates)
Required Skills & Experience
- 5 years in Data Engineering or similar role with at least 12-24 months of exposure to building pipelines for unstructured data extraction including document processing with OCR cloud-native solutions and chunking indexing etc. for downstream consumption by RAG/ Gen AI applications.
- Proficiency in Python dlt for ETL/ELT pipeline duckDB or equivalent tools for analytical in-process analysis dvc for managing large files efficiently.
- Solid SQL skills and experience designing and scaling relational databases. Familiarity with non-relational column based databases is preferred.
- Familiarity with Prefect is preferred or others (e.g. Azure Data Factory)
- Proficiency with the Azure ecosystem. Should have worked on Azure services in production.
- Familiarity with RAG indexing chunking and storage across file types for efficient retrieval.
- Strong Dev Ops/Git workflows and CI/CD (CircleCI / Azure DevOps)
- Experience deploying ML artifacts using MLflow Docker or Kubernetes is good to have.
Bonus skillsets:
- Experience with Computer vision based extraction or experience in building ML models for production
- Knowledge of agentic AI system design - memory tools context orchestration
- Knowledge of data governance privacy laws (GDPR) and enterprise security patterns
They are an early-stage startup so you are expected to wear many hats working with things out of your comfort zone but with real and direct impact in production.
Why our client
- Fast-growing revenue-generating proptech startup
- Flat no BS environment high autonomy for the right talent
- Steep learning opportunities in real world enterprise production use-cases
- Remote work with quarterly meet-ups
- Multi-market multi-cultural client exposure
View more
View less