[VCK] Senior Data Engineer (AI Ingestion Platform)

Software Mind

Job Location:

Buenos Aires - Argentina

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About the Project

Software Mind is building a private tenant-isolated AI assistant for the real estate title and settlement industry. The platform is a retrieval-first (RAG) system that ingests historical email documents and structured metadata into a per-tenant vector index and serves grounded cited expert-weighted answers through a chat-style Q&A interface with single sign-on and full audit logging.

The platform is AWS-native with a Python/FastAPI backend frontend OpenSearch/Pinecone vector store and OpenAI/Anthropic/Bedrock as LLM provider. You will join a senior cross-functional LATAM-based team where hands-on AI delivery experience not just familiarity is the baseline expectation.

You own the ingestion and processing backbone of the platform the pipelines that transform raw email and document corpora into clean PII-minimised chunked and indexed data in the per-tenant vector store. This is the foundational layer the AI extraction gateway depends on; quality here directly determines system accuracy.

Your Responsibilities

Build and own the historical email ingestion pipeline via Microsoft Graph API

Implement SharePoint / OneDrive document ingestion pipeline with scoped folder access

Design and implement the PII minimisation pre-processing layer

Build the vector store indexing workflow (OpenSearch/Pinecone) with per-tenant data isolation

Define and implement the data processing schema; produce and maintain schema documentation

Build the OCR routing orchestrator and integrate OCR service for scanned documents

Implement the raw text / content extraction layer for all supported document types

Define and prototype push vs. pull ingestion strategy from one-time PoC through to incremental nightly pipeline

Ensure data lineage and audit traceability are built into pipeline outputs from the outset

Qualifications :

Must-Have Skills & Experience

6 years in data engineering; strong pipeline and ETL/ELT experience required

Proficiency in Python for data pipeline development

Experience with Microsoft Graph API or similar enterprise email/document APIs (M365 Exchange Online)

AWS data services: S3 DynamoDB Glue and/or Lambda-based event-driven processing

Familiarity with PII detection and data minimisation techniques (regex-based NER-based or purpose-built libraries)

Experience with vector store indexing or semantic search pipeline construction

Additional Information :

Nice-to-Have

Prior experience building ingestion pipelines specifically for AI/ML NLP or LLM-based platforms

OCR tooling experience: AWS Textract Tesseract or commercial OCR services

Understanding of per-tenant data isolation patterns tenant-scoped encryption and row-level security

Familiarity with LangChain document loaders embedding pipelines or vector index management

We are accepting applications from LATAM countries
#LI-DNI

Remote Work :

Yes

Employment Type :

Full-time

About the Project Software Mind is building a private tenant-isolated AI assistant for the real estate title and settlement industry. The platform is a retrieval-first (RAG) system that ingests historical email documents and structured metadata into a per-tenant vector index and serves grounded cite...

About the Project

Your Responsibilities

Build and own the historical email ingestion pipeline via Microsoft Graph API

Implement SharePoint / OneDrive document ingestion pipeline with scoped folder access

Design and implement the PII minimisation pre-processing layer

Build the vector store indexing workflow (OpenSearch/Pinecone) with per-tenant data isolation

Define and implement the data processing schema; produce and maintain schema documentation

Build the OCR routing orchestrator and integrate OCR service for scanned documents

Implement the raw text / content extraction layer for all supported document types

Define and prototype push vs. pull ingestion strategy from one-time PoC through to incremental nightly pipeline

Ensure data lineage and audit traceability are built into pipeline outputs from the outset

Qualifications :

Must-Have Skills & Experience

6 years in data engineering; strong pipeline and ETL/ELT experience required

Proficiency in Python for data pipeline development

Experience with Microsoft Graph API or similar enterprise email/document APIs (M365 Exchange Online)

AWS data services: S3 DynamoDB Glue and/or Lambda-based event-driven processing

Familiarity with PII detection and data minimisation techniques (regex-based NER-based or purpose-built libraries)

Experience with vector store indexing or semantic search pipeline construction

Additional Information :

Nice-to-Have

Prior experience building ingestion pipelines specifically for AI/ML NLP or LLM-based platforms

OCR tooling experience: AWS Textract Tesseract or commercial OCR services

Understanding of per-tenant data isolation patterns tenant-scoped encryption and row-level security

Familiarity with LangChain document loaders embedding pipelines or vector index management

We are accepting applications from LATAM countries
#LI-DNI

Remote Work :

Yes

Employment Type :

Full-time

Apply Now

About Company

Software Mind

Software Mind develops solutions that make an impact for companies around the globe. Tech giants & unicorns, transformative projects, emerging technologies and limitless opportunities these are a few words that describe an average day for us. Building cross-functional engineering te ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click