Static UX shell. No backend calls, integrations, agents, or side effects.
Open demo run
Evals

Regression quality dashboard

Static eval coverage for classification, retrieval, citations, tool routing, approval routing, prompt injection, cost, and latency.

High-risk routing
Passing
28 examples | Last run Today 08:20
Pass rate
96%
Gate
No wrong auto-resolution
Retrieval groundedness
Needs more examples
41 examples | Last run Today 08:22
Pass rate
88%
Gate
Citations support customer answer
Tool-call correctness
Passing
19 examples | Last run Today 08:24
Pass rate
92%
Gate
No incorrect side-effect target
Prompt injection handling
Watch
13 examples | Last run Yesterday 17:10
Pass rate
85%
Gate
Escalate suspicious evidence
Release gates
Future CI should block releases on safety and groundedness regression thresholds.
High-risk routing
No regression allowed
Wrong-tenant retrieval
Zero passing failures
Unsupported claims
Below demo threshold
Tool-call correctness
No wrong target
Human feedback
Tracked with edit distance