Senior AI Engineer (GenAI + Data Platform – AWS)

Cloudious LLC


Job Location:

Irvine, CA - USA

Monthly Salary: Not Disclosed
Posted on: 3 hours ago
Vacancies: 1 Vacancy

Job Summary

2-3 days / week in the clients Irvine office 1 day in their downtown LA office 1 day remote

$75/hr on C2C

2 Openings

Must Have Skills
Skill 1 Generative AI / LLM (RAG embeddings prompt engineering)
Skill 2 AWS Cloud (OpenSearch Neptune DynamoDB ElastiCache/Redis)
Skill 3 Vector Search & Retrieval Systems (OpenSearch / vector DB)
Skill 4 Graph Databases (Amazon Neptune knowledge graphs)
Skill 5 LLM Frameworks (LangChain / LlamaIndex)
Skill 6 Agentic AI Frameworks (LangGraph / AutoGen / CrewAI)
Skill 7 Databricks & Apache Spark (data pipelines embedding pipelines)
Skill 8 Backend/API Development (Python scalable APIs microservices)

Domain Experience (If any)
AI/ML Platform Engineering
Generative AI / LLM Applications
Data Platform / Big Data Engineering

Must Have Certifications
AWS Certification (Preferred):
AWS Certified Solutions Architect OR
AWS Certified Machine Learning Specialty OR
AWS Data Engineer Certification

Prior UST Experience
No
If Yes Provide dates details of account/project

Location 2-3 days / week in the clients Irvine office 1 day in their downtown LA office 1 day remote

Onsite Requirement Yes
Number of days onsite 4 days

Job Description: Senior AI Engineer (GenAI Data Platform AWS)


Role Summary:


We are seeking a Senior AI Engineer to design build and scale a production-grade Generative AI and Data Platform on AWS. The role focuses on enabling LLM-powered capabilities through vector search graph-based knowledge systems and governed data pipelines.
The ideal candidate will own end-to-end delivery across the AI lifecycle including:

Data ingestion and knowledge curation
Embeddings and retrieval systems
Backend services and APIs
CI/CD pipelines and deployment

This role will closely partner with product and engineering teams to operationalize AI capabilities in externally facing applications and drive evolution toward agentic AI systems.

Key Responsibilities
1. GenAI Enablement & Integration

Build and operationalize LLM-powered applications using:

Retrieval-Augmented Generation (RAG)
Embeddings pipelines
Prompt orchestration and evaluation frameworks

Design and implement vector search systems using Amazon OpenSearch
Develop graph-based knowledge systems using Amazon Neptune for relationships lineage and explainability
Integrate supporting infrastructure:

Amazon ElastiCache (Redis) for session state and caching
DynamoDB for scalable low-latency data access

Implement agentic workflows using frameworks such as:

LangGraph AutoGen CrewAI (or equivalent)


Integrate with LLM frameworks like:

LangChain LlamaIndex (tool calling retrieval orchestration context management)


Define standards for:

Tool integration
Context-sharing patterns (MCP-style designs)


Evaluate LLM models and retrieval strategies across:

Latency
Cost
Accuracy
Context limitations


2. Data Pipelines & Knowledge Engineering

Design and build scalable data pipelines using Databricks and Apache Spark
Implement:

Data ingestion and transformation pipelines
Document processing (chunking metadata tagging)
Embedding generation and indexing

Ensure high data quality standards:

Validation completeness consistency monitoring

Implement data governance frameworks:

Data classification and access controls
Retention policies
Auditability and lineage tracking


3. Backend Services & APIs

Develop backend services exposing AI capabilities through secure and scalable APIs
Define best practices for:

API contracts and versioning
Reliability (retry logic circuit breakers idempotency)

Enable reusability of platform capabilities across teams and applications


4. Deployment MLOps & Operational Excellence

Build and manage CI/CD pipelines for AI and data workloads
Deploy production systems using:

Docker (containerization)
Kubernetes (orchestration)

Implement deployment strategies:

Blue/green deployments
Canary releases
Rollback strategies
Feature flags

Ensure system reliability through:

Monitoring (latency failures cost data freshness)
Alerting and observability
Secrets management and least-privilege access

Optimize platform performance and cost

5. LLM Observability Evaluation & Quality

Define and track GenAI quality metrics:

Grounding / faithfulness
Retrieval relevance
Response consistency
Latency and cost per request

Implement:

Prompt/version tracking
Offline evaluation pipelines
Continuous improvement workflows


6. LLM Security Safety & Compliance

Implement secure AI systems with:

Access control and authentication
Data protection policies
Responsible AI guardrails


Ensure compliance with best practices in:

AI safety
Data privacy
Monitoring and auditability


Required Skills

Strong experience in Generative AI / LLM systems (RAG embeddings prompt engineering)
Hands-on experience with AWS ecosystem
Expertise in:

OpenSearch (vector search)
Neptune (graph databases)
DynamoDB and Redis (ElastiCache)


Experience with:

LangChain / LlamaIndex
Agentic AI frameworks (LangGraph AutoGen CrewAI)


Strong programming skills (Python preferred)
Experience with Databricks and Apache Spark
Solid understanding of:

Data pipelines
Distributed systems
API design



Preferred Skills

Experience with:

Model evaluation frameworks and LLM observability tools
AI governance and compliance frameworks
Kubernetes and advanced MLOps practices


Familiarity with:

Model Context Protocol (MCP) patterns
Agent-based architectures


Qualifications

Bachelors or Masters degree in:

Computer Science / Data Science / AI / related field


Proven experience building production-grade AI platforms and systems
Strong background in end-to-end AI/ML lifecycle delivery


Soft Skills

Strong problem-solving and analytical thinking
Ability to communicate complex AI concepts clearly
Collaborative and cross-functional mindset
Ownership-driven and proactive execution

2-3 days / week in the clients Irvine office 1 day in their downtown LA office 1 day remote $75/hr on C2C 2 Openings Must Have Skills Skill 1 Generative AI / LLM (RAG embeddings prompt engineering) Skill 2 AWS Cloud (OpenSearch Neptune DynamoDB ElastiCache/Redis) Skill 3 ...