Applied AI Engineer — Inference & Agent Systems

Arcana Analytics

Not Interested
Bookmark
Report This Job

profile Job Location:

Bengaluru - India

profile Monthly Salary: Not Disclosed
Posted on: 19 hours ago
Vacancies: 1 Vacancy

Job Summary

Title:Applied AI Engineer Inference & Agent Systems

Location:
- BLR/Remote India

What Were Building

Arcana is building AI agents that synthesize information across heterogeneous sources and deliver structured reasoned answers in real time. The product only works if the agents are fast reliable and correct not approximately correct.

Our stack: Go Temporal for orchestration a Plan-Execute-Synthesize agent architecture and an evaluation harness we use to measure every regression. The problems are hard. The latency bar is aggressive. The accuracy requirements are unforgiving.

The Work

Inference Optimization

- Drive TTFT below 400ms for multi-step agent pipelines

- Streaming optimization first token to user while sub-agents are still running

- KV cache strategy prompt compression dynamic context window management

- Multi-provider routing: model selection by latency cost and task type across OpenAI Anthropic Gemini and open-weight models

Agent Architecture

- Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs not sequential chains

- Build reliable orchestration on top of Temporal retries timeouts partial failure recovery idempotency

- Structured output enforcement: JSON schema validation retry loops on malformed LLM output graceful degradation

- Tool call design: schema design that LLMs actually follow reliably across providers

Evaluation & Harness

- Own the eval framework end to end: ground truth datasets automated scoring pipelines regression detection on every PR

- LLM-as-judge pipelines for qualitative output assessment

- Latency regression testing p50/p95/p99 tracked across every deployment

- Adversarial test case design: ambiguous queries missing data conflicting sources malformed tool responses

Infrastructure

- Model serving and cold start optimization

- Async worker architecture for parallel sub-agent execution

- Observability: trace every token every tool call every synthesis step

What Were Looking For

Youve built something that runs in production at a meaningful scale and you understand why its fast (or why it isnt).

Strong signal:

- Youve worked on inference pipelines where TTFT was the primary metric and you moved it meaningfully

- Youve built multi-step agent systems and you know where they break not from reading papers but from watching them fail in production

- Youve written eval harnesses from scratch and you have opinions about what makes a ground truth dataset actually useful

- Youve debugged LLM non-determinism in production and built systems resilient to it

- Youve worked with streaming LLM responses and built infrastructure around partial output handling

Weaker signal (but not disqualifying):

- Youve fine-tuned models but havent shipped inference systems

- Youve used LangChain/LlamaIndex but havent built the

layer underneath

- Strong ML research background without systems exposure

Stack familiarity (we care more about depth than match): Go Python Temporal Kafka PostgreSQL Docker

Why This Role

The problems here dont have blog posts about them yet. Parallel agent DAG execution under hard latency budgets streaming synthesis across partial sub-agent results

eval harnesses for non-deterministic multi-step systems these are genuinely unsolved at production quality. Small team. High ownership. Every engineers decisions ship to production.

Who We Want to Hear From

Youve shipped inference systems at:

- A real-time AI product (search coding assistant chat at scale)

- A model serving infrastructure company

- An agent platform (any domain)

Or youve built eval/harness infrastructure that a team of 10 engineers actually trusted to catch regressions.

Apply

Send to:

Include:

  1. One system you built where latency was the primary constraint

what you measured what you changed what moved

  1. Link to anything public (code writing talks) optional but useful
  2. No cover letter required

We respond to every application.


Required Experience:

Unclear Seniority

Title:Applied AI Engineer Inference & Agent SystemsLocation: - BLR/Remote IndiaWhat Were BuildingArcana is building AI agents that synthesize information across heterogeneous sources and deliver structured reasoned answers in real time. The product only works if the agents are fast reliable and cor...
View more view more

Key Skills

  • Environment
  • Academics
  • Javase
  • Cost Estimation
  • Fire
  • Business Studies