Senior AI Engineer (GenAI + Data Platform – AWS)
Job Location:
Irvine, CA - USA
Monthly Salary:
Not Disclosed
Posted on:
6 hours ago
Vacancies:
1 Vacancy
Job Summary
Role: Senior AI Engineer (GenAI Data Platform AWS)
Location: 4 days a week onsite is must (3 days in Irvine CA & 1 Day in Downtown LA CA)
Job Type: Contract
Role Summary:
- We are seeking a Senior AI Engineer to design build and scale a production-grade Generative AI and Data Platform on AWS.
- The role focuses on enabling LLM-powered capabilities through vector search graph-based knowledge systems and governed data pipelines.
Note:
Must Have Skills:
Must Have Skills:
- Generative AI / LLM (RAG embeddings prompt engineering)
- AWS Cloud (OpenSearch Neptune DynamoDB ElastiCache/Redis)
- Vector Search & Retrieval Systems (OpenSearch / vector DB)
- Graph Databases (Amazon Neptune knowledge graphs)
- LLM Frameworks (LangChain / LlamaIndex)
- Agentic AI Frameworks (LangGraph / AutoGen / CrewAI)
- Databricks & Apache Spark (data pipelines embedding pipelines)
- Backend/API Development (Python scalable APIs microservices)
Must Have Certifications:
AWS Certification (Preferred):
- AWS Certified Solutions Architect OR
- AWS Certified Machine Learning Specialty OR
- AWS Data Engineer Certification
The ideal candidate will own end-to-end delivery across the AI lifecycle including:
- Data ingestion and knowledge curation
- Embeddings and retrieval systems
- Backend services and APIs
- CI/CD pipelines and deployment
Key Responsibilities:
1. GenAI Enablement & Integration
1. GenAI Enablement & Integration
Build and operationalize LLM-powered applications using:
- Retrieval-Augmented Generation (RAG)
- Embeddings pipelines
- Prompt orchestration and evaluation frameworks
- Design and implement vector search systems using Amazon OpenSearch
- Develop graph-based knowledge systems using Amazon Neptune for relationships lineage and explainability
Integrate supporting infrastructure:
- Amazon ElastiCache (Redis) for session state and caching
- DynamoDB for scalable low-latency data access
Implement agentic workflows using frameworks such as:
LangGraph AutoGen CrewAI (or equivalent)
Integrate with LLM frameworks like:
LangChain LlamaIndex (tool calling retrieval orchestration context management)
Define standards for:
Tool integration
Context-sharing patterns (MCP-style designs)
Evaluate LLM models and retrieval strategies across:
Latency
Cost
Accuracy
Context limitations
Integrate with LLM frameworks like:
LangChain LlamaIndex (tool calling retrieval orchestration context management)
Define standards for:
Tool integration
Context-sharing patterns (MCP-style designs)
Evaluate LLM models and retrieval strategies across:
Latency
Cost
Accuracy
Context limitations
2. Data Pipelines & Knowledge Engineering
Design and build scalable data pipelines using Databricks and Apache Spark
Implement:
Design and build scalable data pipelines using Databricks and Apache Spark
Implement:
- Data ingestion and transformation pipelines
- Document processing (chunking metadata tagging)
- Embedding generation and indexing
Ensure high data quality standards:
Validation completeness consistency monitoring
Validation completeness consistency monitoring
Implement data governance frameworks:
- Data classification and access controls
- Retention policies
- Auditability and lineage tracking
3. Backend Services & APIs
Develop backend services exposing AI capabilities through secure and scalable APIs
Define best practices for:
API contracts and versioning
Reliability (retry logic circuit breakers idempotency)
Enable reusability of platform capabilities across teams and applications.
Develop backend services exposing AI capabilities through secure and scalable APIs
Define best practices for:
API contracts and versioning
Reliability (retry logic circuit breakers idempotency)
Enable reusability of platform capabilities across teams and applications.
4. Deployment MLOps & Operational Excellence
Build and manage CI/CD pipelines for AI and data workloads
Deploy production systems using:
Docker (containerization)
Kubernetes (orchestration)
Implement deployment strategies:
Blue/green deployments
Canary releases
Rollback strategies
Feature flags
Ensure system reliability through:
Monitoring (latency failures cost data freshness)
Alerting and observability
Secrets management and least-privilege access
Optimize platform performance and cost
Build and manage CI/CD pipelines for AI and data workloads
Deploy production systems using:
Docker (containerization)
Kubernetes (orchestration)
Implement deployment strategies:
Blue/green deployments
Canary releases
Rollback strategies
Feature flags
Ensure system reliability through:
Monitoring (latency failures cost data freshness)
Alerting and observability
Secrets management and least-privilege access
Optimize platform performance and cost
5. LLM Observability Evaluation & Quality
Define and track GenAI quality metrics:
Grounding / faithfulness
Retrieval relevance
Response consistency
Latency and cost per request
Implement:
Prompt/version tracking
Offline evaluation pipelines
Continuous improvement workflows
Define and track GenAI quality metrics:
Grounding / faithfulness
Retrieval relevance
Response consistency
Latency and cost per request
Implement:
Prompt/version tracking
Offline evaluation pipelines
Continuous improvement workflows
6. LLM Security Safety & Compliance
Implement secure AI systems with:
Access control and authentication
Data protection policies
Responsible AI guardrails
Ensure compliance with best practices in:
AI safety
Data privacy
Monitoring and auditability
Required Skills:
Implement secure AI systems with:
Access control and authentication
Data protection policies
Responsible AI guardrails
Ensure compliance with best practices in:
AI safety
Data privacy
Monitoring and auditability
Required Skills:
- Strong experience in Generative AI / LLM systems (RAG embeddings prompt engineering)
- Hands-on experience with AWS ecosystem
Expertise in:
- OpenSearch (vector search)
- Neptune (graph databases)
- DynamoDB and Redis (ElastiCache)
Experience with:
- LangChain / LlamaIndex
- Agentic AI frameworks (LangGraph AutoGen CrewAI)
- Strong programming skills (Python preferred)
- Experience with Databricks and Apache Spark
Solid understanding of:
- Data pipelines
- Distributed systems
- API design
Preferred Skills:
Experience with:
Experience with:
- Model evaluation frameworks and LLM observability tools
- AI governance and compliance frameworks
- Kubernetes and advanced MLOps practices
Familiarity with:
- Model Context Protocol (MCP) patterns
- Agent-based architectures
Qualifications:
- Bachelors or Masters degree in: Computer Science / Data Science / AI / related field
- Proven experience building production-grade AI platforms and systems
- Strong background in end-to-end AI/ML lifecycle delivery.