Verifiable, Not Eliminated
Why grounding an AI to your documents is not enough -- and what it looks like when hallucinations become catchable at review instead of invisible behind fluent prose.
The failure mode that matters most in applied AI is not the obvious one. A model that invents a court case from nothing is embarrassing but detectable: check the citation and the lie collapses. The dangerous failure mode is subtler. Real facts, shuffled into a false combination. A genuine company name attached to the wrong contract. A correct statistic under the wrong heading. The atoms are all true; the sentence is false. Standard fixes catch the first kind and largely miss the second.
Why grounding alone is not enough
The dominant response to hallucination is RAG – feed the model your documents and it will stay on the rails. Useful, and almost everyone does it. But RAG constrains the source of raw material; it does not constrain what the model is allowed to say about that material. A 2025 Stanford study of premium legal-research tools – marketed as “hallucination-free,” grounded in proprietary legal databases – found they still hallucinated 17 to 33 percent of the time (Magesh et al., 2025). OpenAI’s researchers confirmed last year that this is structural: the training process rewards confident prose, and a future model version does not fix that (Kalai et al., 2025).
they still hallucinated 17 to 33 percent of the time
The harder case is worse still. The 2025 MontageLie benchmark (arXiv:2505.15792) tested whether standard verification methods catch deceptive texts built entirely from true statements – real facts, reordered to distort causality without introducing a single fabrication. Atomic-fact checkers, which verify statements individually, scored AUC-ROC below 65% on this benchmark: they have no way to detect manipulation that lives between the facts rather than inside them.
atomic-fact checkers, which verify statements individually, scored AUC-ROC below 65% on this benchmark
The question is not “how do we make the model never hallucinate.” It is “how do we make hallucinations catchable at review, instead of invisible behind fluent prose.”
What we built
A model normally does three things in one pass: find the facts, resolve disagreements between sources, and write the prose. When those pressures collide in a single generation step, fluency wins – sounding right beats being right. The fix is pulling those three steps apart. Each source claim is typed and recorded before any prose is written. Conflicts between sources are resolved by explicit, auditable rules – not by the model’s judgment inside the prose. Only then does the model write, under a hard constraint: every checkable token must trace back to a claim that survived resolution. This is also the structural answer to the MontageLie problem: inter-fact relationships are fixed at the resolution stage, before the generator runs, rather than left to a post-hoc detector.
A verification pass then scans the finished prose and maps every number, date, and named entity against its backing claim. The result is not a pass/fail score – it is a labelled output the reviewer can read. Every backed claim is marked with its provenance type. Every unbacked token is flagged by severity: FABRICATED means the prose asserted something the claim set never authorised; REVIEW means the backing is present but weak. Unbacked claims surface as reviewable items. They do not hide in the prose.
the training process rewards confident prose, and a future model version does not fix that
The discipline running through all of it: measure first, gate later. Every threshold – how many unbacked tokens trigger a block, what counts as a weak source – is set from measured data on real output, not from an assumption made before any measurement was taken. The verification measurement layer is live on the research-digest pipeline, producing provenance reports on every digest without blocking output. Full gates follow once real baselines have accumulated. Retrofit to the signal-validation and paper pipelines is in design review.
Honest positioning
The claim is not that hallucination has been eliminated. OpenAI’s paper above explains why it cannot be. The claim is that a hallucination, if it occurs, shows up as an unsourced token at review rather than passing as fluent prose. That is a much lower bar than “perfect,” and it is a bar that can be hit and measured. Most commercial tools selling hallucination control advertise elimination; the 2025 Stanford study showed that claim does not survive inspection. “Verifiable, not eliminated” is the honest alternative – and, on current evidence, the defensible one.
References
- Magesh et al., “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools” – Stanford RegLab / Journal of Empirical Legal Studies, 2025. https://onlinelibrary.wiley.com/doi/full/10.1111/jels.12413
- Kalai, Nachum, Vempala, Zhang, “Why Language Models Hallucinate” – OpenAI, arXiv:2509.04664, 2025. https://arxiv.org/abs/2509.04664
- “Long-Form Information Alignment Evaluation Beyond Atomic Facts” (MontageLie / DoveScore) – arXiv:2505.15792, 2025. https://arxiv.org/abs/2505.15792
- Min et al., “FActScore” – arXiv:2305.14251, 2023. https://arxiv.org/abs/2305.14251