Release qualification report
Pre-release
run details
1.Release history
Pass rate and token cost across every qualified release, oldest to newest. Hover any point for its full detail.
Pass rate
Tokens per correct run
Cost vs correctness
2.Token distribution
Every run in this qualification, placed by total tokens spent. Click a point to inspect the run.
3.Category performance
Token cost per case tag — box/dots are the cost of correct runs (quartiles when there are enough), color is the pass rate. Click a row to filter the run explorer.
4.Questions across runs
Each question across the comparison history, oldest to newest — the final column is this run's cells; click one for the full grader verdict.
5.Run explorer
Filtering is analysis — the aggregate line follows the filter.