Software Developer/Engineer (Mid Level experience) philadelphia
Location: Philadelphia Work Mode: Hybrid minimum 3 days in the office
Interview Schedule: 1 st interview 1-hour in-person; 2 nd interview 1-hour in-person
Consultant Requirements On-Prem LLM & Vector DB Implementation
Core Experience
Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments
Strong proficiency in Python for LLM inference prompt engineering and integration
Experience with CPU-based inference model quantization and performance tuning
Vector Databases & RAG
Practical experience with open-source vector databases such as Qdrant Chroma Milvus or pgvector
Proven implementation of Retrieval-Augmented Generation (RAG) pipelines
Experience generating and managing embeddings and metadata filtering
Security & Governance
Understanding of data privacy air-gapped deployments and enterprise security requirements
Experience implementing access controls and audit logging
Nice to Have
Experience with LangChain or LlamaIndex
Exposure to Rust Go or C for high-performance services
Familiarity with Docker and Kubernetes for on-prem deployments
Knowledge of inference frameworks (e.g. vLLM Hugging Face Transformers)
Prior work in regulated or enterprise environments
Deliverables
Reference architecture and deployment guidance
Working prototype (LLM vector DB RAG)
Documentation and knowledge transfer to internal teams
Software Developer/Engineer (Mid Level experience) philadelphia Location: Philadelphia Work Mode: Hybrid minimum 3 days in the office Interview Schedule: 1 st interview 1-hour in-person; 2 nd interview 1-hour in-person Consultant Requirements On-Prem LLM & Vector DB Implementation Core Experience ...
Software Developer/Engineer (Mid Level experience) philadelphia
Location: Philadelphia Work Mode: Hybrid minimum 3 days in the office
Interview Schedule: 1 st interview 1-hour in-person; 2 nd interview 1-hour in-person
Consultant Requirements On-Prem LLM & Vector DB Implementation
Core Experience
Hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral / Mixtral in on-prem or private environments
Strong proficiency in Python for LLM inference prompt engineering and integration
Experience with CPU-based inference model quantization and performance tuning
Vector Databases & RAG
Practical experience with open-source vector databases such as Qdrant Chroma Milvus or pgvector
Proven implementation of Retrieval-Augmented Generation (RAG) pipelines
Experience generating and managing embeddings and metadata filtering
Security & Governance
Understanding of data privacy air-gapped deployments and enterprise security requirements
Experience implementing access controls and audit logging
Nice to Have
Experience with LangChain or LlamaIndex
Exposure to Rust Go or C for high-performance services
Familiarity with Docker and Kubernetes for on-prem deployments
Knowledge of inference frameworks (e.g. vLLM Hugging Face Transformers)
Prior work in regulated or enterprise environments
Deliverables
Reference architecture and deployment guidance
Working prototype (LLM vector DB RAG)
Documentation and knowledge transfer to internal teams
View more
View less