Glossary

The vocabulary of verifiable judgment

AEQUARA scores AI on one property — calibration — and attests the result so it can be checked, not just trusted. These are the terms behind that, in plain English.

Every term AEQUARA acts on links to its primary source. See the methodology, the Brier explainer, and the proof ledger.

Calibration
Does confidence match reality?
A forecaster is calibrated when the things it says are 70% likely happen about 70% of the time. Calibration is a property of confidence, not of any single answer — a calibrated model can still be wrong, it just isn’t systematically over- or under-sure. It is the one property AEQUARA measures and attests.
Primary source →
Calibration gap
Average confidence minus accuracy.
A single signed number: average stated confidence minus actual accuracy. Positive means overconfident (sure more often than right); negative means underconfident; zero is perfect calibration. The free Calibration Scorecard computes yours from your own answers.
Primary source →
Brier score
Mean squared error of probabilistic claims.
The average squared distance between a stated probability and what actually happened (0 = perfect, 1 = worst). Because it rewards being both right and appropriately unsure, it is the headline metric on the public Model Calibration Index and on the Scorecard.
Primary source →
Proper scoring rule
A score you can’t game by lying about confidence.
A scoring rule is proper when a forecaster minimizes its expected score only by reporting its true probabilities — hedging or bluffing makes the score worse. The Brier score is a proper scoring rule, which is why AEQUARA scores on it.
Primary source →
Overconfidence
Confidence runs ahead of accuracy.
The systematic tendency to be more certain than the evidence warrants — the most common miscalibration in both humans and language models (Lichtenstein & Fischhoff, 1977; Moore & Healy, 2008). It shows up as a positive calibration gap.
Primary source →
Resolution (sharpness)
How decisively a forecaster separates outcomes.
A calibrated forecaster that only ever says "50%" is honest but useless. Resolution measures how far predictions move away from the base rate toward 0 or 1 — the decisiveness that makes a calibrated forecast actually informative.
Reliability
The calibration half of the Brier decomposition.
In the Murphy decomposition of the Brier score, reliability is the component that captures calibration error directly — how far predicted probabilities sit from observed frequencies. Lower is better.
Primary source →
Reliability diagram
Predicted probability vs. observed frequency.
A plot of what a forecaster said would happen (x) against how often it did (y). A perfectly calibrated forecaster sits on the 45° diagonal; bulges above or below the line reveal under- or over-confidence at specific probability bands.
Expected Calibration Error (ECE)
Bucketed average calibration gap.
Predictions are binned by confidence, each bin’s accuracy is compared to its average confidence, and the gaps are averaged (weighted by bin size). A compact scalar summary of a reliability diagram.
Perceived likelihood
How much a buyer believes the output is right.
In Hormozi’s value equation, perceived likelihood of achievement is one of the four levers of value. AEQUARA’s thesis is that calibration, made recomputable by anyone, is what makes perceived likelihood provable rather than asserted.
Primary source →
Attestation
A claim is hash-locked and recomputable.
Where a vendor asserts ("our AI is accurate"), AEQUARA attests: a specific claim is canonicalized, hash-locked (HMAC-SHA256), and written to an append-only ledger you can re-derive yourself. Verifiable output, not a vibe.
Primary source →
Hash-lock (content-addressing)
A claim’s fingerprint, re-derivable in your browser.
The claim is serialized deterministically (sorted-key recursive JSON), then hashed. Any change to the content changes the hash, so the digest is a tamper-evident fingerprint anyone can recompute and check against the ledger.
Primary source →
Recomputability
You re-derive the proof yourself.
The independence in AEQUARA’s proof spine comes not from who signs but from the fact that anyone can recompute the hash in their own browser and check it against the public ledger. Today the signature (HMAC-SHA256) is our own — it makes the record tamper-evident; an independent third-party anchor is in progress. Trust the arithmetic, not the signer.
Primary source →
NPX-100
AEQUARA’s published calibration index.
A cross-industry, 8-axis index that scores 0–100 and is published on a quarterly cycle — the flagship demonstration that calibration can be measured consistently across very different forecasters.
Primary source →

Common questions

What is the difference between accuracy and calibration?

Accuracy is how often a forecaster is right. Calibration is whether its confidence matches that accuracy — a model can be accurate but overconfident, or modest but well-calibrated. AEQUARA measures and attests calibration, because a confidence number you can trust is what makes a high-stakes AI output usable.

Why does AEQUARA use the Brier score?

The Brier score is a proper scoring rule: a forecaster minimizes it only by reporting its true probabilities, so it cannot be gamed by overstating or understating confidence. That makes it the honest headline metric for comparing forecasters.

Is a low calibration gap the same as being smart?

No. Calibration is about honesty of confidence, not raw intelligence. A well-calibrated forecaster knows the limits of what it knows — it says 60% when it should and is right about 60% of those times. That self-knowledge is exactly what high-stakes decisions need.

How can I check these definitions are not just marketing?

Every term that AEQUARA acts on links to its primary source — the methodology, the Brier explainer, the attestation ledger — and the ledger entries are hash-locked and re-derivable in your own browser. The glossary is an index of claims we already stand behind, not new ones.

Want to feel the difference between confidence and accuracy? Measure your own calibration — or see how attestation differs from assertion.

Take the Calibration Scorecard →Attest vs. assert →

Glossary

The vocabulary of verifiable judgment

AEQUARA scores AI on one property — calibration — and attests the result so it can be checked, not just trusted. These are the terms behind that, in plain English.

Every term AEQUARA acts on links to its primary source. See the methodology, the Brier explainer, and the proof ledger.

Calibration
Does confidence match reality?
A forecaster is calibrated when the things it says are 70% likely happen about 70% of the time. Calibration is a property of confidence, not of any single answer — a calibrated model can still be wrong, it just isn’t systematically over- or under-sure. It is the one property AEQUARA measures and attests.
Primary source →
Calibration gap
Average confidence minus accuracy.
A single signed number: average stated confidence minus actual accuracy. Positive means overconfident (sure more often than right); negative means underconfident; zero is perfect calibration. The free Calibration Scorecard computes yours from your own answers.
Primary source →
Brier score
Mean squared error of probabilistic claims.
The average squared distance between a stated probability and what actually happened (0 = perfect, 1 = worst). Because it rewards being both right and appropriately unsure, it is the headline metric on the public Model Calibration Index and on the Scorecard.
Primary source →
Proper scoring rule
A score you can’t game by lying about confidence.
A scoring rule is proper when a forecaster minimizes its expected score only by reporting its true probabilities — hedging or bluffing makes the score worse. The Brier score is a proper scoring rule, which is why AEQUARA scores on it.
Primary source →
Overconfidence
Confidence runs ahead of accuracy.
The systematic tendency to be more certain than the evidence warrants — the most common miscalibration in both humans and language models (Lichtenstein & Fischhoff, 1977; Moore & Healy, 2008). It shows up as a positive calibration gap.
Primary source →
Resolution (sharpness)
How decisively a forecaster separates outcomes.
A calibrated forecaster that only ever says "50%" is honest but useless. Resolution measures how far predictions move away from the base rate toward 0 or 1 — the decisiveness that makes a calibrated forecast actually informative.
Reliability
The calibration half of the Brier decomposition.
In the Murphy decomposition of the Brier score, reliability is the component that captures calibration error directly — how far predicted probabilities sit from observed frequencies. Lower is better.
Primary source →
Reliability diagram
Predicted probability vs. observed frequency.
A plot of what a forecaster said would happen (x) against how often it did (y). A perfectly calibrated forecaster sits on the 45° diagonal; bulges above or below the line reveal under- or over-confidence at specific probability bands.
Expected Calibration Error (ECE)
Bucketed average calibration gap.
Predictions are binned by confidence, each bin’s accuracy is compared to its average confidence, and the gaps are averaged (weighted by bin size). A compact scalar summary of a reliability diagram.
Perceived likelihood
How much a buyer believes the output is right.
In Hormozi’s value equation, perceived likelihood of achievement is one of the four levers of value. AEQUARA’s thesis is that calibration, made recomputable by anyone, is what makes perceived likelihood provable rather than asserted.
Primary source →
Attestation
A claim is hash-locked and recomputable.
Where a vendor asserts ("our AI is accurate"), AEQUARA attests: a specific claim is canonicalized, hash-locked (HMAC-SHA256), and written to an append-only ledger you can re-derive yourself. Verifiable output, not a vibe.
Primary source →
Hash-lock (content-addressing)
A claim’s fingerprint, re-derivable in your browser.
The claim is serialized deterministically (sorted-key recursive JSON), then hashed. Any change to the content changes the hash, so the digest is a tamper-evident fingerprint anyone can recompute and check against the ledger.
Primary source →
Recomputability
You re-derive the proof yourself.
The independence in AEQUARA’s proof spine comes not from who signs but from the fact that anyone can recompute the hash in their own browser and check it against the public ledger. Today the signature (HMAC-SHA256) is our own — it makes the record tamper-evident; an independent third-party anchor is in progress. Trust the arithmetic, not the signer.
Primary source →
NPX-100
AEQUARA’s published calibration index.
A cross-industry, 8-axis index that scores 0–100 and is published on a quarterly cycle — the flagship demonstration that calibration can be measured consistently across very different forecasters.
Primary source →

Common questions

What is the difference between accuracy and calibration?

Why does AEQUARA use the Brier score?

Is a low calibration gap the same as being smart?

How can I check these definitions are not just marketing?

Want to feel the difference between confidence and accuracy? Measure your own calibration — or see how attestation differs from assertion.

Take the Calibration Scorecard →Attest vs. assert →

The vocabulary of verifiable judgment

Calibration

Calibration gap

Brier score

Proper scoring rule

Overconfidence

Resolution (sharpness)

Reliability

Reliability diagram

Expected Calibration Error (ECE)

Perceived likelihood

Attestation

Hash-lock (content-addressing)

Recomputability

NPX-100

Common questions

The vocabulary of verifiable judgment

Calibration

Calibration gap

Brier score

Proper scoring rule

Overconfidence

Resolution (sharpness)

Reliability

Reliability diagram

Expected Calibration Error (ECE)

Perceived likelihood

Attestation

Hash-lock (content-addressing)

Recomputability

NPX-100

Common questions