Shipping the anti-hallucination engine (Sprint 3 launch)

Draft — targeting publication 2026-04-15 · 9 min read

Editor's note (2026-06-07). This build log describes the full ten-layer design. As of today, one layer (paragraph-level fact-checking with calibrated confidence) is live in production; the remaining layers ship incrementally. Kept as-is as a design document.

Hallucinations are the single most expensive failure mode of modern AI assistants. They are not rare. They are not bugs. They are the predictable output of systems optimised to sound fluent rather than to be correct.

TANYA's Sprint 3 ships the first public version of the anti-hallucination engine — the subsystem that sits between the language models and the user, and refuses to let confident nonsense through. This post is a build log of how it works.

The core idea in one paragraph

When TANYA needs to answer a factual question, it does not ask a single model. It asks multiple models independently, compares their claims at the proposition level, checks those propositions against grounded sources, and only emits the claims that survive all three gates. If a claim fails, TANYA either retrieves evidence, asks a clarifying question, or flatly says I do not know.

The ten-layer pipeline

Every response flows through a ten-layer pipeline. The anti-hallucination engine lives at layers 4 through 7.

Input sanitisation — prompt injection defence.
Intent classification — routing to the right subsystem.
Memory retrieval — conversation, episodic, semantic.
Multi-model drafting — N independent generations.
Claim extraction — decompose drafts into atomic propositions.
Cross-model consensus — propositions that diverge are flagged.
Source grounding — every surviving claim gets a citation or a hedge.
Calibration — confidence scores per claim, not per response.
Refusal gate — below threshold, the response is withheld.
Rendering — final format, with inline evidence.

What was hard

The interesting engineering is not in any single layer. It is in the propositional graph that lets the layers talk to each other. Early prototypes treated each layer as a filter on a string. That fails because strings lose the structure you need to reason about uncertainty. Rewriting the pipeline around a typed graph of atomic claims unlocked everything else.

The second hard part was getting the models to cooperate during the consensus step. Ask two frontier models the same factual question and you will get two eloquent, plausible, contradictory answers. The engine has to accept that disagreement is signal, not noise, and degrade gracefully into a hedge when consensus is low.

What it looks like from the outside

Users do not see the engine. They see one of three things:

A clean, cited answer.
A hedged answer with explicit uncertainty (I believe X, but my sources disagree on Y).
A refusal (I do not have enough evidence to answer this).

The third option is the one I am proudest of. No current commercial assistant will admit ignorance this cleanly.

Benchmarks

(To be filled in at publication. Targeting: TruthfulQA, HaluEval, FACTOR, plus an internal eval of 500 factual questions with adversarial framing.)

What is next

Sprint 4 connects the anti-hallucination engine to the world-intelligence module, so the grounding sources automatically expand as new trustworthy corpora come online. Sprint 5 adds multi-agent adversarial critique on top — a dedicated red-team model that actively tries to break the output.

The goal by end of year: a cognitive AI that is right more often than any single frontier model on factual questions, and that knows when it is not.

Try it

The engine is open source. Clone the repo, run pnpm tanya start, and point it at your own model endpoints. Feedback and issues are extremely welcome.

— The TANYA founder