AI Engineering Lead

Ridgewood, NJ - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

Key Responsibilities

Architecture & System Design

Architect design and lead multi-agent LLM systems using LangGraph LangChain and Promptfoo for prompt lifecycle management and benchmarking.
Build Retrieval-Augmented Generation (RAG) pipelines leveraging hybrid vector search (dense keyword) using LanceDB Pinecone or Elasticsearch.
Define system workflows for summarization query routing retrieval and response generation ensuring minimal latency and high precision.
Develop RAG evaluation frameworks combining retrieval precision/recall hallucination detection and latency metrics aligned with analyst and business use cases.

AI Model Integration & Fine-Tuning

Integrate GPT-4o PaLM 2 and open-weight models (LLaMA Mistral) for task-specific contextual Q&A.
Fine-tune transformer models (BERT SentenceTransformers) for document classification summarization and sentiment analysis.
Manage prompt routing and variant testing using Promptfoo or equivalent tools.

Agentic AI & Orchestration

Implement multi-agent architectures with modular flows enabling task-specific agents for summarization retrieval classification and reasoning.
Design fallback and recovery behaviors to ensure robustness in production.
Employ LangGraph for parallel and stateful agent orchestration error recovery and deterministic flow control.

Data Engineering & RAG Infrastructure

Architect ingestion pipelines for structured and unstructured data including financial statements filings and PDF documents.
Leverage MongoDB for metadata storage and Redis Streams for async task execution and caching.
Implement vector-based search and retrieval layers for high-throughput and low-latency AI systems.

Observability & Production Deployment

Deploy end-to-end AI systems on AWS EKS / Azure Kubernetes Service integrated with CI/CD pipelines (Azure DevOps).
Build comprehensive monitoring dashboards using OpenTelemetry and Signoz tracking latency retrieval precision and application health.
Enforce testing and regression validation using golden datasets and structured assertion checks for all LLM responses.

Cross-functional Collaboration

Collaborate with DevOps MLOps and application development teams to integrate AI APIs with React / FastAPI-based user interfaces.
Work with business analysts to translate credit compliance and customer-support requirements into actionable AI agent workflows.
Mentor a small team of GenAI developers and data engineers in RAG embeddings and orchestration techniques.

Qualifications :

Experience:
- 5 years as an AI or ML Engineer
Required Skills & Experience
LLMs & GenAI: GPT-4o PaLM 2 LangGraph LangChain Promptfoo SentenceTransformers
RAG Frameworks: LanceDB Pinecone ElasticSearch FAISS MongoDB
Agentic AI: LangGraph multi-agent orchestration routing logic task decomposition
Fine-Tuning: BERT / domain-specific transformer tuning evaluation framework design
Infra & MLOps: FastAPI Docker Kubernetes (EKS/AKS) Redis Streams Azure DevOps CI/CD
Monitoring: OpenTelemetry Signoz Prometheus
Languages & Tools: Python SQL REST APIs Git Pandas NumPy
Nice-to-Have Skills
Knowledge of Reranker-based retrieval (MiniLM / CrossEncoder)
Familiarity with Prompt evaluation and scoring (BLEU ROUGE Faithfulness)
Domain exposure to Credit Risk Banking and Investment Analytics
Experience with RAG benchmark automation and model evaluation dashboards

Additional Information :

Remote Work :

Yes

Employment Type :

Full-time

Key Responsibilities Architecture & System DesignArchitect design and lead multi-agent LLM systems using LangGraph LangChain and Promptfoo for prompt lifecycle management and benchmarking.Build Retrieval-Augmented Generation (RAG) pipelines leveraging hybrid vector search (dense keyword) using Lanc...

Key Responsibilities

Architecture & System Design

Architect design and lead multi-agent LLM systems using LangGraph LangChain and Promptfoo for prompt lifecycle management and benchmarking.
Build Retrieval-Augmented Generation (RAG) pipelines leveraging hybrid vector search (dense keyword) using LanceDB Pinecone or Elasticsearch.
Define system workflows for summarization query routing retrieval and response generation ensuring minimal latency and high precision.
Develop RAG evaluation frameworks combining retrieval precision/recall hallucination detection and latency metrics aligned with analyst and business use cases.

AI Model Integration & Fine-Tuning

Integrate GPT-4o PaLM 2 and open-weight models (LLaMA Mistral) for task-specific contextual Q&A.
Fine-tune transformer models (BERT SentenceTransformers) for document classification summarization and sentiment analysis.
Manage prompt routing and variant testing using Promptfoo or equivalent tools.

Agentic AI & Orchestration

Implement multi-agent architectures with modular flows enabling task-specific agents for summarization retrieval classification and reasoning.
Design fallback and recovery behaviors to ensure robustness in production.
Employ LangGraph for parallel and stateful agent orchestration error recovery and deterministic flow control.

Data Engineering & RAG Infrastructure

Architect ingestion pipelines for structured and unstructured data including financial statements filings and PDF documents.
Leverage MongoDB for metadata storage and Redis Streams for async task execution and caching.
Implement vector-based search and retrieval layers for high-throughput and low-latency AI systems.

Observability & Production Deployment

Deploy end-to-end AI systems on AWS EKS / Azure Kubernetes Service integrated with CI/CD pipelines (Azure DevOps).
Build comprehensive monitoring dashboards using OpenTelemetry and Signoz tracking latency retrieval precision and application health.
Enforce testing and regression validation using golden datasets and structured assertion checks for all LLM responses.

Cross-functional Collaboration

Collaborate with DevOps MLOps and application development teams to integrate AI APIs with React / FastAPI-based user interfaces.
Work with business analysts to translate credit compliance and customer-support requirements into actionable AI agent workflows.
Mentor a small team of GenAI developers and data engineers in RAG embeddings and orchestration techniques.

Qualifications :

Experience:
- 5 years as an AI or ML Engineer
Required Skills & Experience
LLMs & GenAI: GPT-4o PaLM 2 LangGraph LangChain Promptfoo SentenceTransformers
RAG Frameworks: LanceDB Pinecone ElasticSearch FAISS MongoDB
Agentic AI: LangGraph multi-agent orchestration routing logic task decomposition
Fine-Tuning: BERT / domain-specific transformer tuning evaluation framework design
Infra & MLOps: FastAPI Docker Kubernetes (EKS/AKS) Redis Streams Azure DevOps CI/CD
Monitoring: OpenTelemetry Signoz Prometheus
Languages & Tools: Python SQL REST APIs Git Pandas NumPy
Nice-to-Have Skills
Knowledge of Reranker-based retrieval (MiniLM / CrossEncoder)
Familiarity with Prompt evaluation and scoring (BLEU ROUGE Faithfulness)
Domain exposure to Credit Risk Banking and Investment Analytics
Experience with RAG benchmark automation and model evaluation dashboards

Additional Information :

Remote Work :

Yes

Employment Type :

Full-time

Key Skills

Administrative Skills
Facilities Management
Biotechnology
Creative Production
Design And Estimation
Architecture

Apply Now

About Company

Vichara

Vichara is a Financial Services focused products and services firm headquartered in NY and building systems for some of the largest i-banks and hedge funds in the world.

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click