Trust Index·Research · For AI teams·7 min read

The cheapest good answer is usually the best one — what the Trust Index shows about paying for quality

Of the 18 models on the AI Trust Index, only three sit on the frontier where you can’t buy more trust for the same money. The other fifteen are, in a precise sense, overpaying or underdelivering.

By AEQUARA · July 1, 2026

Most AI buying decisions get made on a single axis: which model scores highest? The AI Trust Index tracks a second axis most leaderboards ignore entirely — cost — and cross-referencing the two produces an uncomfortable finding. Of the 18 models we score, only three sit on the Pareto frontier, the set of models where you genuinely cannot buy more calibrated trust for the same money spent. Every other model on the list is either paying a premium for trust it could get cheaper elsewhere, or leaving trust on the table for a price it’s already paying.

What “Pareto-efficient” means here

A model is Pareto-efficient on our trust-vs-cost chart if no other model beats it on both axes at once — nothing scores higher for the same or lower price. It’s a purely relative measurement: a model can be excellent in absolute terms and still not be Pareto-efficient, if something else on the list matches its score for less.

The three that clear the bar

As of the current run: DeepSeek-V3 (Trust 94, $0.0027 per call) is Pareto-efficient at the very top of the field — matching the highest measured score at a fraction of a cent. Qwen3 32B and Llama 3.3 70B (Trust 89 each, running locally at effectively $0 marginal cost) are Pareto-efficient further down the curve, for anyone whose task doesn’t need the absolute ceiling. That’s three models, out of eighteen, where the honest answer to “should I pay more for a better score” is no — you can’t.

What the other fifteen are doing

Everything else on the index sits inside the frontier, which sounds like an indictment but usually isn’t — a model can be a reasonable choice for reasons the trust-vs-cost chart doesn’t capture: latency requirements, a specific capability the task actually needs, contractual or compliance constraints, or simply a workflow already built around a particular provider. What being inside the frontier does mean is that the choice is no longer justified by trust score alone — if you’re paying a premium over a Pareto-efficient alternative with a comparable score, the premium needs a different reason than “it’s more trustworthy,” because on this measurement, for that money, it usually isn’t.

Why this needs to be an ongoing measurement, not a one-time chart

The frontier moves. A model can be released, gated, repriced, or deprecated between measurement cycles, and a Pareto-efficient choice today can stop being one by the next run — which is exactly what happened with the field’s own #1 finisher: read what Fable 5’s #1 ranking actually shows for a case study in a top score that isn’t the Pareto-efficient pick at all. A static “best model” recommendation goes stale the moment any input changes; the chart itself doesn’t.

Where this becomes infrastructure instead of a chart

This is the specific decision NEXUS’s Pareto Model Routing is designed to automate — sending each query to the cheapest model that clears the quality bar for that task, re-evaluated continuously rather than hard-coded to whichever model won a leaderboard on a given day. NEXUS is in development (API opening Q3 2026), so this is the substrate’s design intent rather than a claim about what’s routing traffic today — but the trust-vs-cost data above is exactly the kind of measurement that decision needs to be made well, by a system or by a person.

The full trust-vs-cost chart, re-rankable by any axis, is at the model index — recomputable from the same published data, not asserted.

Keep reading

Calibration

Confident is not correct: what calibration is, and why it matters for any AI you rely on

Use cases

Five moments to reach for a calibrated tool — and what it actually does for you

What “Pareto-efficient” means here

The three that clear the bar

What the other fifteen are doing

Why this needs to be an ongoing measurement, not a one-time chart

Where this becomes infrastructure instead of a chart

The full trust-vs-cost chart, re-rankable by any axis, is at the model index — recomputable from the same published data, not asserted.