🐼 Panda · Release qualification

Release qualification report

run details

1.Release history

Pass rate and token cost across every qualified release, oldest to newest. Hover any point for its full detail.

Pass rate
Tokens per correct run
Cost vs correctness

2.Token distribution

Every run in this qualification, placed by total tokens spent. Click a point to inspect the run.

3.Category performance

Token cost per case tag — box/dots are the cost of correct runs (quartiles when there are enough), color is the pass rate. Click a row to filter the run explorer.

4.Questions across runs

Each question across the comparison history, oldest to newest — the final column is this run's cells; click one for the full grader verdict.

5.Run explorer

Filtering is analysis — the aggregate line follows the filter.