Home  ›  Trust  ›  Methodology
Surface 3 · engineering substrate

How evidence becomes a verdict.

The math behind every AEQUARA output: a six-tier evidence ladder, per-source confidence and recency decay, multi-model panel routing, and band-thresholded action labels. This is the engineering substrate that Constitution §3 commits to.

The formula

final_weight = type_rank × source_confidence × recency_decay

Every source weighted into a verdict carries a final scalar weight in [0.00, 1.00]. The verdict is the panel-aggregated confidence-weighted sum across all weighted sources. See the worked example ↓

§1The six-tier evidence ladder

Every source weighted into a verdict starts with a type-rank in [0.00, 1.00]. The ladder reflects information-architecture quality: primary statutes and case law sit highest because they bind the question itself; synthetic LLM-generated comparables sit lowest because they cannot ground a claim on their own.

TierTypeExamplesType-rank
T1Primary statutes + case lawNY Lab. Law §202-f · IRC §401(k) · controlling case-law precedent · FRCP · MUR1.00
T2Regulatory + agency guidanceIRS Pub. 590-A · DOL FAQs · EEOC enforcement guidance · FDA labeling · OCR HIPAA0.85
T3Vetted industry datasetsBLS OEWS · Levels.fyi (≥100 verified records / cell) · SHRM · MIT comp studies0.70
T4Editorial / journalisticNYT · Bloomberg · WSJ · ProPublica · peer-reviewed Annual Reviews0.55
T5Crowd-sourced / forumReddit r/legaladvice · Glassdoor · Blind · /r/personalfinance0.30
T6Model-generated / syntheticLLM training-data recall (no fresh source) · AI-generated comparables0.10

T6 sources are never shown as evidence alone. When the only weighted sources are T6, the verdict either declines (OUT OF SCOPE) or labels itself SYNTHETIC-ONLY — for orientation, not action. The label is required by the substrate; it is not a user-toggleable hide. See Constitution §6.2.

§2Source-confidence calculation

Source-confidence is a 0.0–1.0 multiplier on type-rank, computed per-source from three substrate signals:

Source-confidence is computed deterministically from these signals; we do not let the verdict model self-rate source confidence. The full computation lives in substrate/source-conf.ts and is gated against the test fixtures committed at substrate/__tests__/source-conf.test.ts.

§3Per-claim recency decay

Recency decays linearly, with the decay rate depending on the tier:

§4Worked example

Walk through a real-shape Severance Analyzer computation. Question: is the offered severance package fair?

Severance Analyzer verdict · 2026-06-01 · mock

"My employer offered $38K severance + 1mo COBRA after 7yr tenure. Is this fair?"

[T3] BLS OEWS 2026 Q1 — NY metro, SOC-code-matched, 5-10yr tenure band. Median: $44K base. 0.70 × 0.94 × 1.00 0.658
[T3] Levels.fyi negotiation dataset — 1,847 verified records, same industry+role+region. 0.70 × 0.81 × 1.00 0.567
[T1] NY Lab. Law §202-f — non-compete enforceability statute, 2024 amendments. 1.00 × 1.00 × 1.00 1.000
[T1] Doe v. Acme Corp. (S.D.N.Y. 2025 · illustrative) — precedent narrowing non-compete enforceability. 1.00 × 0.72 × 1.00 0.720
[T2] SHRM Severance Practice Survey 2025 — 78% of >4wk packages include COBRA >1mo. 0.85 × 0.86 × 0.95 0.694
Verdict Package undervalues tenure by ~14% on base + COBRA is short of 78th-percentile standard practice. Negotiate up. panel-weighted conf 0.82

The verdict's final confidence (0.82) is then mapped to the action-threshold band — here SAFE TO SEND. The reasoning chain shown on the VerdictCard is exactly the source rows above, plus the model-panel votes (§5).

§5Panel routing & vote aggregation

Every verdict is computed by a multi-model panel — minimum 3 LLMs from at least 2 distinct provider suites (e.g. Anthropic + OpenAI + Google). Each model independently consumes the weighted source set and returns its own verdict + per-finding confidence.

The aggregation rules:

Per-tool panel composition publishes alongside the first real Track Record cycle (Q3 2026). Model identity is disclosed by suite (Anthropic Opus, Google Gemini-Pro, OpenAI GPT-4o, etc.) on every VerdictCard. Abstentions are shown as abstentions, not hidden.

§6Action thresholds

Confidence scores are never shown as raw percentages without their action label. Per Perplexity-grounded SOTA research: "tie to action thresholds, not raw percentages."

0% – 49%

DON'T SEND

Insufficient evidence for any verdict. AEQUARA declines. Recommend human review or refusal.

50% – 79%

NEEDS HUMAN REVIEW

Tentative verdict but panel did not converge. Treat as directional input, not a decision.

80% – 100%

SAFE TO SEND

Panel converged. Brier-calibrated. Use as your verdict, with the override button always available.

Per-tool thresholds are tuned to that tool's historical Brier — Severance Analyzer's 80% bar is calibrated against its own miss rate. Published per-tool at /trust-v2/track-record.

§7What this methodology does NOT do

Honesty about limits

The methodology does not:

Brand-level principles (not the per-tool math) live at /methodology. The full charter that this methodology operationalizes is Decision Constitution v1.0. The output of this methodology is what powers every VerdictCard on /decide.

← Back to Trust home