What "citations" actually means
A citation is a verifiable pointer from a generated claim to the source that supports it. The user clicks the citation and lands on the exact passage that grounded the claim.
Three things that aren't citations, even though teams call them that:
- "The model added a list of URLs at the bottom of the answer." Not a citation. Not tied to specific claims.
- "The model wrote 'according to the document.'" Not a citation. The buyer can't verify.
- "We retrieved 5 chunks and the model wrote an answer." Not a citation. The retrieval is opaque to the user.
A real citation lets the user verify any specific claim by clicking through to the source. If a regulator can't reproduce "where did this number come from" in 5 seconds, you don't have citations.
Why this matters
For most consumer AI use cases, citations are nice-to-have. For regulated, B2B, or anything where the answer has to be defensible — legal, financial, medical, internal compliance, research — they're not optional. Without them, the system has the same trust profile as a chatbot: the model said it, and you have to take its word for it.
Buyers in regulated spaces increasingly know to ask. The teams that haven't built citations end up scrambling to add them after the procurement conversation goes sideways.
Pattern 1: Quote-then-attribute
The model is required to quote the source passage verbatim and follow with an attribution. Output shape:
"[Quote from source.]" — Source: [doc-id], paragraph [N].
Verification: the quote can be checked against the source corpus by simple string-matching. If the quote doesn't appear verbatim in the cited document, the citation is broken.
Eval check: for every output, run a string-search of the quoted passage against the cited source. Pass if the passage matches. Fail otherwise. This eval is fully automatable and runs in CI.
Where it fits: legal, compliance, document Q&A. Anywhere the quote IS the evidence.
Pattern 2: Inline anchored citations
The model emits structured citations inside the answer text, pointing at specific document IDs and offsets. The UI renders them as clickable footnotes. Click a footnote, the source pane scrolls to the cited passage and highlights it.
Output looks like prose with footnote markers; the JSON payload includes citation metadata (doc-id, char offset start and end, page number where applicable). The renderer ties the two together.
Verification: programmatic — the offsets must land inside the cited document and the surrounding text must semantically support the claim. The semantic check is a Haiku-judge call: "does the passage at this offset support this claim, yes or no?"
Where it fits: research assistants, internal knowledge tools, anything where the answer is prose with embedded support.
Pattern 3: Tool-call provenance
For tool-using agents, every claim in the answer maps to a specific tool call that produced the data. The UI surfaces a "where did this come from" expansion that shows the tool call, its arguments, and its result.
Verification: each claim in the answer must have a corresponding tool call in the trace. Claims without tool-call provenance are flagged as "model knowledge" rather than "looked up," and the UI displays them differently.
Where it fits: agentic systems doing data lookups, integrations with internal databases, anything where the source is a system rather than a document.
Pattern 4: Confidence-gated citation
The model is required to either cite a source or explicitly say "I don't have a source for this." The eval gates on: every claim either has a citation or has a confidence disclaimer attached. Unsourced unhedged claims fail the eval.
This is the simplest pattern to retrofit onto an existing system. The hardest part is convincing the model to be honest about which claims have sources — system-prompt framing matters, and the eval is the enforcement mechanism.
Where it fits: as a transition to one of the other patterns; as a baseline guarantee in systems where perfect citations aren't yet feasible.
The eval bar we install
For any system shipping citations to a regulated buyer, the eval suite (see the minimum viable eval) gets these specific cases:
- Verbatim-quote check. For pattern-1 systems, every quoted passage must appear verbatim in the cited source. Run on every PR.
- Offset-validity check. For pattern-2 systems, every citation offset must land inside the document and contain text that semantically supports the claim. Haiku judge.
- No-claims-without-citations check. Run a Haiku judge over the answer with the prompt "list every factual claim in this answer, and for each, indicate whether it has a citation or disclaimer." Fail if any claim is unsourced and undisclaimed.
- Adversarial cases. Hand the system inputs designed to elicit confidently-wrong answers (questions outside the corpus, etc). The system should either decline or disclose uncertainty.
Each of these is an eval case. They run on every PR. The first time you set them up, expect failures — that's the point. Fix the system until they pass.
The implementation cost
Adding pattern-1 citations to an existing RAG-shaped system: ~3-5 days of focused work. Pattern 2 with a working UI: ~1-2 weeks. Pattern 3 (tool-call provenance) is mostly free if you already log tool calls (see the LLM logging pipeline); the hard part is the UI.
The cost is real. The cost of not having citations when the regulated buyer asks is "we don't get the deal." That math has been one-sided every time we've run it.
The summary
Verbatim quote, anchored offset, tool-call provenance, or confidence-gated. Pick one. Run the eval that proves it works. Citations aren't optional in the markets where AI features are about to mature — they're the gating requirement. Most teams will discover this when a deal slips. A small number will install citations before the deal slips.