AI Hallucination
An AI hallucination is a confident, plausible-sounding output from a language model that is factually wrong, fabricated, or unsupported by the retrieved sources. Common causes: thin retrieval, ambiguous prompts, and the model's tendency to fill gaps with its training-data priors.
Long definition
The model never says "I don't know" by default — it generates the highest-probability token sequence given its inputs. When that sequence happens to be wrong, the output looks identical to a correct answer. That confident wrongness is what the field calls hallucination.
Common patterns:
- Fabricated citations — the model cites a study or URL that doesn't exist. Documented in legal cases (the 2023 Mata v. Avianca incident, where a lawyer submitted fake ChatGPT-generated case law).
- Misattributed quotes — the right quote, wrong author. Or the right author, fabricated quote.
- Conflated entities — two products, companies, or people merged into one in the answer because their training-data co-occurrence was high.
- Outdated facts asserted as current — the model uses parametric knowledge from a training cutoff, not retrieved data.
- Plausible-but-wrong details — the "glue on pizza" Google AI Overviews case (May 2024), where the model surfaced a Reddit joke as a serious cooking instruction.
For SEO and GEO, hallucination is a brand-risk surface, not just an accuracy problem:
- Wrong attribution — your brand cited as the source for a claim you never made. Or your competitor's product features attributed to you (sometimes flatteringly, sometimes not).
- Fabricated quotes from your team — the model invents a CEO statement that never happened.
- Mixed-up product specs — the model conflates your SKU's specs with a competitor's.
- Phantom citations — the model cites your URL for content that lives on a different page or doesn't exist on your site at all.
Mitigation tactics:
- Strong, consistent on-page facts. Repeat key claims across pages so the embedding signal is unambiguous.
- Schema.org structured data for products, organizations, and people — disambiguates entities for grounding systems.
- Clean canonical URLs and stable page structure. Pages that change URL frequently confuse citation tracking.
- Brand monitoring across answer engines (Profound, Otterly, Athena) to catch and correct hallucinations in your space.
- Direct correction channels where they exist: Google's feedback link in AI Overviews, OpenAI's feedback tools, Perplexity's report flow.
Hallucination is reducing over time as RAG and reranking improve, but it has not disappeared and will not disappear in 2026. Treat it as a permanent operating condition of the surface.
Common misconceptions
- "GPT-4 / Gemini / Claude don't hallucinate anymore." They hallucinate less, especially when grounded, but rates above 0% persist across all frontier models. Anthropic, OpenAI, and Google publish ongoing eval numbers — none are at zero.
- "Hallucinations only happen on obscure topics." They happen on popular topics too, particularly when sources disagree, the topic has rapidly changing facts, or the question phrasing is ambiguous. Health, finance, and legal queries are higher-risk regardless of popularity.
- "You can't do anything about hallucinations affecting your brand." You can. Strong, consistent, schema-marked-up content reduces ambiguity. Brand monitoring catches issues. Direct correction channels exist for Google, OpenAI, and Perplexity. The defense is operational, not technical.
Continue exploring