Principal AI Architect
San Jose, CA - USA
Job Summary
Position: Principal AI Architect
Location: SAN JOSE CA***ONSITE***
Duration: 1 Year
Job Description:
AI Architecture & System Design
AI system architectures: multi-agent orchestration layers RAG pipelines hybrid retrieval systems (knowledge graphs vector search) text-to-SQL engines and real-time inference APIs.
Define and own technical blueprints for new AI products - from data ingestion and embedding pipelines through to response generation evaluation and production monitoring.
Solve hard engineering problems: latency precision/recall trade-offs context window management hallucination mitigation and cost-efficient LLM usage at scale.
Make deliberate well-documented architecture decisions with clear trade-off analysis (build vs. buy framework selection deployment topology).
Implementation
Write production-quality code - Python SQL API services - across the full AI lifecycle: data qualification model training evaluation containerised deployment and API serving.
Build and own reusable framework-quality components (chunking pipelines retrieval layers agent tool-calling modules) that accelerate team velocity.
Own CI/CD pipelines Docker-based deployment and production telemetry for AI services.
AI Market Intelligence & Technology Strategy
Track and evaluate the AI landscape - new LLMs agentic frameworks (LangGraph Google ADK CrewAI AutoGen) retrieval methods fine-tuning techniques and emerging tooling.
Translate AI market trends into actionable roadmap inputs - surfacing opportunities for step change capability improvements before competitors do.
Cross-Functional Technical Partnership
Partner closely with Product Data Science and Platform Engineering to align AI architecture with product direction data constraints and infrastructure capabilities.
Communicate complex technical trade-offs clearly to non-technical stakeholders - translating architecture decisions into business impact narratives.
Must-Have Experience
12 years of hands-on experience in AI/ML engineering and data science with significant depth in production system delivery.
Deep working expertise in LLM application development: LangChain LangGraph tool-calling agents RAG prompt engineering embedding pipelines and hybrid retrieval.
Proven track record architecting and shipping multi-agent systems knowledge graph-powered retrieval (Neo4j or equivalent) and real-time inference APIs.
Strong ML fundamentals: XGBoost deep learning NLP time-series forecasting propensity modelling experimental design and causal inference.
Experience delivering AI systems in regulated industries (financial services cybersecurity healthcare) with SOX GDPR or SOC 2 compliance awareness.
Expert-level Python and SQL; fluency with GCP AWS Docker FastAPI BigQuery FAISS and CI/CD tooling.
Technical Depth
Ability to design hybrid retrieval architectures that balance precision (graph traversal) and semantic recall (vector similarity) with reranking layers - not just off-the-shelf RAG.
Hands-on experience reducing LLM inference latency in production (e.g. redesigning pipelines from multi-minute to sub-30-second response times).
QUALIFICATIONS
Masters or PhD in Computer Science Operations Research Statistics or a related quantitative field
AWS Certified Machine Learning Engineer or GCP Professional ML Engineer certification.
Completion of an AI Strategy or AI Governance programme.
Prior experience at a data science / ML services firm enterprise SaaS or fintech - where you shipped AI to external customers not just internal tools.
Hands-on experience with Snowflake Cortex or comparable enterprise LLM deployment platforms.
Open-source contributions to AI/ML tooling published technical writing or conference presentations.