Build intelligent data-driven platform. The focus is to support the development of next-generation test analytics and test agents that enable faster insights improved diagnostics and scalable infrastructure for Generative AI systems connecting test stations line level data and pipelines . You will build automated evaluation tools and conduct rigorous statistical analyses to ensure the reliability of both human and AI-based assessment systems.
Benchmark adapt and integrate AI/ML models into existing software systems. Independently run and analyze ML experiments for real improvements.
Must-Have Requirements
Requirement Details
Backend/Systems Experience 3 years building production backend or distributed systems (pre-AI experience required)
Production AI Systems Has shipped AI/LLM features serving real users at scale - not just prototypes or demos
Agentic Systems Has built AI agents skills tools or MCP (Model Context Protocol) integrations
Python Proficient for backend development
Secondary Language Working knowledge of Go TypeScript or Rust
Cloud Infrastructure Deep experience with AWS/GCP/Azure - cost optimization compute decisions not just deployment
Container & Orchestration Hands-on with Docker and Kubernetes - can build deploy debug and scale services themselves
LLM Integration Understands token economics context limits rate limiting structured outputs API failure modes
LLM Evaluation Understands how to evaluate LLM outputs and the inherent challenges (non-determinism quality measurement regression detection)
Hands-On Engineer Not just an architect - writes code debugs production issues deploys their own work
Preferred / Differentiators
Built multi-step agentic workflows with tool use and function calling
Experience with agent orchestration frameworks (LangGraph CrewAI or custom)
Built guardrails fallbacks or graceful degradation for AI systems
Streaming inference and async agent orchestration
Cost/latency optimization: caching batching prompt compression
ML observability tools: Langfuse Arize Braintrust W&B
Retrieval systems (vector search hybrid search) - as a tool not the focus
Build intelligent data-driven platform. The focus is to support the development of next-generation test analytics and test agents that enable faster insights improved diagnostics and scalable infrastructure for Generative AI systems connecting test stations line level data and pipelines . You will b...
Build intelligent data-driven platform. The focus is to support the development of next-generation test analytics and test agents that enable faster insights improved diagnostics and scalable infrastructure for Generative AI systems connecting test stations line level data and pipelines . You will build automated evaluation tools and conduct rigorous statistical analyses to ensure the reliability of both human and AI-based assessment systems.
Benchmark adapt and integrate AI/ML models into existing software systems. Independently run and analyze ML experiments for real improvements.
Must-Have Requirements
Requirement Details
Backend/Systems Experience 3 years building production backend or distributed systems (pre-AI experience required)
Production AI Systems Has shipped AI/LLM features serving real users at scale - not just prototypes or demos
Agentic Systems Has built AI agents skills tools or MCP (Model Context Protocol) integrations
Python Proficient for backend development
Secondary Language Working knowledge of Go TypeScript or Rust
Cloud Infrastructure Deep experience with AWS/GCP/Azure - cost optimization compute decisions not just deployment
Container & Orchestration Hands-on with Docker and Kubernetes - can build deploy debug and scale services themselves
LLM Integration Understands token economics context limits rate limiting structured outputs API failure modes
LLM Evaluation Understands how to evaluate LLM outputs and the inherent challenges (non-determinism quality measurement regression detection)
Hands-On Engineer Not just an architect - writes code debugs production issues deploys their own work
Preferred / Differentiators
Built multi-step agentic workflows with tool use and function calling
Experience with agent orchestration frameworks (LangGraph CrewAI or custom)
Built guardrails fallbacks or graceful degradation for AI systems
Streaming inference and async agent orchestration
Cost/latency optimization: caching batching prompt compression
ML observability tools: Langfuse Arize Braintrust W&B
Retrieval systems (vector search hybrid search) - as a tool not the focus
View more
View less