Per-tool Brier calibration scores updated quarterly. Per Constitution §9: if any tool's quarterly Brier degrades by more than 0.05, we pause it, publish the degradation here, notify every affected user, and refund the quarter.
Mock data · clearly labeled. Real Q2 numbers ship when verdict pipeline reaches production.
| Tool | Brier (Q1) | Band | Trend vs Q4 | Verdicts | |
|---|---|---|---|---|---|
| Severance Analyzeremployment · for the laid-off worker | 0.14 | well-calibrated | ↓ 0.02 (better) | ~1,400 mock | Publishes Q3 2026 |
| Contract Surgeoncontract review · for the receiving party | 0.17 | well-calibrated | ± 0.00 | ~890 mock | Publishes Q3 2026 |
| IRS Shieldtax positions · for the taxpayer | 0.22 | well-calibrated | ↓ 0.04 (better) | ~620 mock | Publishes Q3 2026 |
| Medical Bill Defenderbilling analysis · for the patient | 0.19 | well-calibrated | ↑ 0.03 (worse) | ~780 mock | Publishes Q3 2026 |
| Lease Analyzertenant rights · for the tenant | 0.20 | well-calibrated | ↓ 0.01 (better) | ~1,100 mock | Publishes Q3 2026 |
| Divorce Decoderprocess navigation · for the asking spouse | 0.28 | directional | ↓ 0.05 (better) | ~320 mock | Publishes Q3 2026 |
| Offer Letter Analyzercomp review · for the candidate | 0.13 | well-calibrated | ± 0.00 | ~2,100 mock | Publishes Q3 2026 |
| Insurance Claim Coachclaim filing · for the claimant | 0.24 | well-calibrated | ↑ 0.02 (worse) | ~540 mock | Publishes Q3 2026 |
| Demand Letter Proletter drafting · for the sender | 0.30 | directional | ↑ 0.06 (worse — auto-pause triggered) | ~210 mock | Publishes Q3 2026 |
| KAIROSlearning hubs · for the learner | 0.11 | well-calibrated | ↓ 0.03 (better) | ~3,400 mock | Publishes Q3 2026 |
NEEDS HUMAN REVIEW.Public-when-wrong is binding. Each sustained appeal publishes here (PII redacted unless user opts in).
Model panel converged on $48K severance as median; employer countered at $39K and held. Root cause: Levels.fyi cell had only 47 records for the user's role+region+tenure band (below 100-record threshold) but was still weighted at T3. Mitigation: tightened minimum-record threshold to 150 for low-volume bins; user refunded; flagged for adversarial-pass strengthening.
Model flagged CPT 99213 as upcoded from 99212; hospital provided documentation supporting the 99213 level. Root cause: substrate did not weight the hospital's documentation-pattern record (T2 regulatory source); panel under-routed to medical-coding specialist model. Mitigation: added documentation-pattern as required T2 input for E&M code disputes; user refunded.
The Brier methodology behind these scores is documented at /trust-v2/methodology. The binding commitment to publish them is Constitution §9. The appeal-process when we're wrong is /trust-v2/appeal.