// Learn

What is hallucination in AI?

Hallucination is when an AI model confidently states something that is factually wrong or entirely made up.

The short version

LLMs generate text by predicting the most likely next word. They're very good at producing text that sounds plausible. The problem is that "sounds plausible" and "is true" are not the same thing. A model can invent a citation that doesn't exist, attribute a quote to the wrong person, or fabricate a statistic with total confidence.

This isn't a bug that will be fixed in the next version. It's a consequence of how these models work. They learned patterns in language, not a database of verified facts. When they don't have enough signal to give a correct answer, they fill the gap with something that fits the pattern.

How it works

Hallucinations happen for several reasons:

  • Knowledge gaps. The model was trained on data up to a cutoff date. It doesn't know about anything after that, but it will still try to answer rather than saying "I don't know."
  • Rare topics. The less training data exists about a subject, the more the model relies on pattern-matching rather than recall. Niche topics get more hallucination.
  • Confident framing. Most training data doesn't include hedging or uncertainty. The model learned to write assertively, so it states hallucinated facts with the same confidence as real ones.
  • Prompt pressure. If you ask a model to "list five examples" and only three exist, it will often invent two more rather than give you three.

Common hallucination patterns:

  • Fake citations and paper titles that sound real but don't exist
  • Correct-sounding URLs that lead nowhere
  • Plausible statistics with no source
  • Attributed quotes that the person never said
  • Code that looks right but references functions or APIs that don't exist

Why it matters

If you use AI to generate content, write code, or answer questions, you need to verify the output. Not because the model is unreliable on average, but because it gives no signal when it's wrong. It doesn't flag uncertainty. It doesn't say "I'm guessing here." The confidence is the same whether the answer is correct or fabricated.

RAG (retrieval-augmented generation) reduces hallucination by grounding the model in real data. Good prompting helps too, especially instructions like "if you're not sure, say so" or "only use information from the provided context." But nothing eliminates it entirely. Verification is part of the workflow.

=++==+==++=