How evidence becomes a verdict.

The math behind every AEQUARA output: a six-tier evidence ladder, per-source confidence and recency decay, multi-model panel routing, and band-thresholded action labels. This is the engineering substrate that Constitution §3 commits to.

The formula

final_weight = type_rank × source_confidence × recency_decay

Every source weighted into a verdict carries a final scalar weight in [0.00, 1.00]. The verdict is the panel-aggregated confidence-weighted sum across all weighted sources. See the worked example ↓

§1The six-tier evidence ladder

Every source weighted into a verdict starts with a type-rank in [0.00, 1.00]. The ladder reflects information-architecture quality: primary statutes and case law sit highest because they bind the question itself; synthetic LLM-generated comparables sit lowest because they cannot ground a claim on their own.

Tier	Type	Examples	Type-rank
T1	Primary statutes + case law	NY Lab. Law §202-f · IRC §401(k) · controlling case-law precedent · FRCP · MUR	1.00
T2	Regulatory + agency guidance	IRS Pub. 590-A · DOL FAQs · EEOC enforcement guidance · FDA labeling · OCR HIPAA	0.85
T3	Vetted industry datasets	BLS OEWS · Levels.fyi (≥100 verified records / cell) · SHRM · MIT comp studies	0.70
T4	Editorial / journalistic	NYT · Bloomberg · WSJ · ProPublica · peer-reviewed Annual Reviews	0.55
T5	Crowd-sourced / forum	Reddit r/legaladvice · Glassdoor · Blind · /r/personalfinance	0.30
T6	Model-generated / synthetic	LLM training-data recall (no fresh source) · AI-generated comparables	0.10

T6 sources are never shown as evidence alone. When the only weighted sources are T6, the verdict either declines (OUT OF SCOPE) or labels itself SYNTHETIC-ONLY — for orientation, not action. The label is required by the substrate; it is not a user-toggleable hide. See Constitution §6.2.

§2Source-confidence calculation

Source-confidence is a 0.0–1.0 multiplier on type-rank, computed per-source from three substrate signals:

Vintage stability — has the source been amended, retracted, or superseded since its last verified citation? Stable sources hold 1.0; recently-amended sources decay until the amendment is digested into the substrate.
Cross-reference density — how often the source is cited by other T1/T2 sources of the same domain. Sources cited by senior authorities gain confidence; isolated sources lose it.
Specimen alignment — does the source's stated scope (jurisdiction, time, subject) overlap the question being asked? Out-of-scope sources are downweighted.

Source-confidence is computed deterministically from these signals; we do not let the verdict model self-rate source confidence. The full computation lives in substrate/source-conf.ts and is gated against the test fixtures committed at substrate/__tests__/source-conf.test.ts.

§3Per-claim recency decay

Recency decays linearly, with the decay rate depending on the tier:

T1 primary statute — published 5+ years ago, not amended: retains 1.0 recency. Statute longevity is a feature, not a bug.
T2 regulatory guidance — >18 months old without re-confirmation: 0.7. The faster guidance updates, the faster stale guidance decays.
T3 industry dataset — quarter-by-quarter decay; the latest BLS OEWS quarter retains 1.0, two quarters back decays to 0.85, etc.
T4 editorial — news cycle decay; same-month source retains 1.0, 6+ months back decays to 0.6.
T5 forum / crowd — >12 months old: 0.4. Crowd evidence ages fast; today's reddit is rarely yesterday's reddit.
T6 synthetic — recency irrelevant (already capped at 0.10 type-rank).

§4Worked example

Walk through a real-shape Severance Analyzer computation. Question: is the offered severance package fair?

Severance Analyzer verdict · 2026-06-01 · mock

"My employer offered $38K severance + 1mo COBRA after 7yr tenure. Is this fair?"

[T3] BLS OEWS 2026 Q1 — NY metro, SOC-code-matched, 5-10yr tenure band. Median: $44K base. 0.70 × 0.94 × 1.00 0.658

[T3] Levels.fyi negotiation dataset — 1,847 verified records, same industry+role+region. 0.70 × 0.81 × 1.00 0.567

[T1] NY Lab. Law §202-f — non-compete enforceability statute, 2024 amendments. 1.00 × 1.00 × 1.00 1.000

[T1] Doe v. Acme Corp. (S.D.N.Y. 2025 · illustrative) — precedent narrowing non-compete enforceability. 1.00 × 0.72 × 1.00 0.720

[T2] SHRM Severance Practice Survey 2025 — 78% of >4wk packages include COBRA >1mo. 0.85 × 0.86 × 0.95 0.694

Verdict Package undervalues tenure by ~14% on base + COBRA is short of 78th-percentile standard practice. Negotiate up. panel-weighted conf 0.82

The verdict's final confidence (0.82) is then mapped to the action-threshold band — here SAFE TO SEND. The reasoning chain shown on the VerdictCard is exactly the source rows above, plus the model-panel votes (§5).

§5Panel routing & vote aggregation

Every verdict is computed by a multi-model panel — minimum 3 LLMs from at least 2 distinct provider suites (e.g. Anthropic + OpenAI + Google). Each model independently consumes the weighted source set and returns its own verdict + per-finding confidence.

The aggregation rules:

Convergence — ≥2 of 3 models agree on verdict AND each contributing model's per-finding confidence ≥0.80 → SAFE TO SEND
Soft agreement — ≥2 of 3 models agree but per-finding confidence in [0.50, 0.79] → NEEDS HUMAN REVIEW
Panel split (1-1-1) — OR ≥1 model returns INSUFFICIENT_CONTEXT → NEEDS HUMAN REVIEW with explicit panel-split disclosure on the card
Decline-quorum — ≥2 of 3 models return INSUFFICIENT_CONTEXT or OUT_OF_SCOPE → verdict declines (OUT OF SCOPE)
Disagreement penalty — models disagree AND per-finding confidence varies >0.30 between models → verdict downgraded one band, panel-disagreement footnote required

Per-tool panel composition publishes alongside the first real Track Record cycle (Q3 2026). Model identity is disclosed by suite (Anthropic Opus, Google Gemini-Pro, OpenAI GPT-4o, etc.) on every VerdictCard. Abstentions are shown as abstentions, not hidden.

§6Action thresholds

Confidence scores are never shown as raw percentages without their action label. Per Perplexity-grounded SOTA research: "tie to action thresholds, not raw percentages."

0% – 49%

DON'T SEND

Insufficient evidence for any verdict. AEQUARA declines. Recommend human review or refusal.

50% – 79%

NEEDS HUMAN REVIEW

Tentative verdict but panel did not converge. Treat as directional input, not a decision.

80% – 100%

SAFE TO SEND

Panel converged. Brier-calibrated. Use as your verdict, with the override button always available.

Per-tool thresholds are tuned to that tool's historical Brier — Severance Analyzer's 80% bar is calibrated against its own miss rate. Published per-tool at /trust-v2/track-record.

§7What this methodology does NOT do

Honesty about limits

The methodology does not:

Substitute for licensed-professional judgment. Where the question requires an attorney, CPA, MD, or RIA's authority, AEQUARA refers users to counsel. See Constitution §6.1.
Eliminate user-side asymmetry bias. AEQUARA represents the user, not the counterparty. Mitigated by the adversarial-pass model in Constitution §7, but not zero.
Claim equal quality across all jurisdictions and languages. Per-jurisdiction and per-language Brier scores are published separately; outside the 5 best-calibrated states or 6 supported languages, the verdict carries an explicit limitation flag.
Hide miscalibration. When a tool's quarterly Brier exceeds 0.05 drift, it auto-pauses and the degradation is published. See /trust-v2/track-record.

Brand-level principles (not the per-tool math) live at /methodology. The full charter that this methodology operationalizes is Decision Constitution v1.0. The output of this methodology is what powers every VerdictCard on /decide.

← Back to Trust home