Send Candidates who have strong AI/LLM and backend developer experience
- LLM product experience: prompting tool calling evals latency/cost tradeoffs.
- Agent architectures: planning/execution loops memory/state sandboxed tools HITL safety constraints.
- Frameworks/SDKs: Vercel AI SDK LangChain/LangGraph Anthropic Agents OpenAI tool calling sandboxed runtimes.
- Infra familiarity: Kubernetes serverless stream processing feature stores vector search.
Job Description:
The Role
Youll own the backend systems that make AI agents reliable in production: agent runtime services integrations data modeling observability and platform reliability. Youll design ship measure and harden systems that power real customer workflows at scale.
What Youll Do
- Own agent runtime services: tool execution state management orchestration retries/idempotency rate limiting.
- Design APIs & contracts: stable versioned internal/external APIs webhooks/events integration adapters.
- Model complex domain data: schemas for agent memory/state workflow history audit trails permissions multi-tenant isolation.
- Build integrations at scale: OAuth webhooks sync engines connectors with robust observability and failure handling.
- Reliability engineering: define SLIs/SLOs implement tracing timeouts circuit breakers budgets and incident response.
- Performance & cost controls: optimize latency/throughput queues caches storage; manage inference/tool-call costs and runaway tasks.
- Raise the bar: code quality testing strategy on-call hygiene runbooks postmortems mentoring.
What Were Looking For (Required)
- 6 years building backend systems for production SaaS platforms or distributed systems.
- Strong fundamentals in distributed systems concurrency queues/workers caching and production ops.
- Data modeling depth: relational design (Postgres/MySQL) migrations indexing query optimization data correctness.
- API design excellence: clear evolvable contracts across internal services and external partners.
- Thrive in high-velocity environments without compromising reliability/security.
- Ownership mindset: build ship operate; comfortable with ambiguity and rapid iteration.
- LLM product experience: prompting tool calling evals latency/cost tradeoffs.
- Agent architectures: planning/execution loops memory/state sandboxed tools HITL safety constraints.
- Frameworks/SDKs: Vercel AI SDK LangChain/LangGraph Anthropic Agents OpenAI tool calling sandboxed runtimes.
- Infra familiarity: Kubernetes serverless stream processing feature stores vector search.
Send Candidates who have strong AI/LLM and backend developer experience LLM product experience: prompting tool calling evals latency/cost tradeoffs. Agent architectures: planning/execution loops memory/state sandboxed tools HITL safety constraints. Frameworks/SDKs: Vercel AI SDK LangChain/LangGraph...
Send Candidates who have strong AI/LLM and backend developer experience
- LLM product experience: prompting tool calling evals latency/cost tradeoffs.
- Agent architectures: planning/execution loops memory/state sandboxed tools HITL safety constraints.
- Frameworks/SDKs: Vercel AI SDK LangChain/LangGraph Anthropic Agents OpenAI tool calling sandboxed runtimes.
- Infra familiarity: Kubernetes serverless stream processing feature stores vector search.
Job Description:
The Role
Youll own the backend systems that make AI agents reliable in production: agent runtime services integrations data modeling observability and platform reliability. Youll design ship measure and harden systems that power real customer workflows at scale.
What Youll Do
- Own agent runtime services: tool execution state management orchestration retries/idempotency rate limiting.
- Design APIs & contracts: stable versioned internal/external APIs webhooks/events integration adapters.
- Model complex domain data: schemas for agent memory/state workflow history audit trails permissions multi-tenant isolation.
- Build integrations at scale: OAuth webhooks sync engines connectors with robust observability and failure handling.
- Reliability engineering: define SLIs/SLOs implement tracing timeouts circuit breakers budgets and incident response.
- Performance & cost controls: optimize latency/throughput queues caches storage; manage inference/tool-call costs and runaway tasks.
- Raise the bar: code quality testing strategy on-call hygiene runbooks postmortems mentoring.
What Were Looking For (Required)
- 6 years building backend systems for production SaaS platforms or distributed systems.
- Strong fundamentals in distributed systems concurrency queues/workers caching and production ops.
- Data modeling depth: relational design (Postgres/MySQL) migrations indexing query optimization data correctness.
- API design excellence: clear evolvable contracts across internal services and external partners.
- Thrive in high-velocity environments without compromising reliability/security.
- Ownership mindset: build ship operate; comfortable with ambiguity and rapid iteration.
- LLM product experience: prompting tool calling evals latency/cost tradeoffs.
- Agent architectures: planning/execution loops memory/state sandboxed tools HITL safety constraints.
- Frameworks/SDKs: Vercel AI SDK LangChain/LangGraph Anthropic Agents OpenAI tool calling sandboxed runtimes.
- Infra familiarity: Kubernetes serverless stream processing feature stores vector search.
View more
View less