AI Hallucination

Uncertainty of AI (
Epsilon Greedy,
Creativity,
Extrapolation)

For example, hallucinations in robotics models pose significant dangers, unlike language models' hallucinations, which merely provide incorrect information. Mechanistic interpretability provides a promising and explicit method to control AI.

LLMs hallucinate when fine-tuned with new factual knowledge, as they learn new information slower than consistent knowledge.

Transformers internally perform stochastic reasoning, representing the semantics that context can have through noisy superposition. The more uncertain the input, the more concepts are activated. (

Transluce Monitor demo,

Statistical Thinking)

Triggering Prompts

Questions about non-existent terms or concepts

Inability to consistently handle different domains like numbers and dates

AI Hallucination Notion

Hallucination Benchmark

Extrinsic hallucination

Intrinsic hallucination

Hallucination Mitigation

Multimodal Hallucination

AI Priming

AI sycophancy

AI Hallucination Detection

LLMs Will Always Hallucinate, and We Need to Live With This

As Large Language Models become more ubiquitous across domains, it becomes important to examine their inherent limitations critically. This work argues that hallucinations in language models are...

https://arxiv.org/abs/2409.05746

The Internal State of an LLM Knows When It’s Lying

arXiv reCAPTCHA

https://arxiv.org/pdf/2304.13734

arxiv.org

https://arxiv.org/pdf/2310.07064.pdf

Bigger AI chatbots more inclined to spew nonsense

Bigger AI chatbots more inclined to spew nonsense — and people don't always realize

Nature - Artificial-intelligence models are improving overall but are more likely to answer every question, leading to wrong answers.

https://www.nature.com/articles/d41586-024-03137-3

Bigger AI chatbots more inclined to spew nonsense — and people don't always realize

Masking

Retrieval Head or relevant

Induction head could induce hallucinations

arxiv.org

https://arxiv.org/pdf/2410.18860

The "be concise" instruction reduces counter-explanations, decreasing accuracy by up to 20%. When questions are posed with high confidence, the model is up to 15% more likely to agree with false claims.

huggingface.co

https://huggingface.co/blog/davidberenstein1957/phare-analysis-of-hallucination-in-leading-llms

Hallucination is not a mysterious phenomenon, but a natural result of statistical classification errors. Even with perfect data, errors inevitably occur during cross-entropy minimization. When viewing the model as performing binary classification of correct output or not, hallucinations necessarily occur in unlearnable patterns (e.g., rarely appearing birthday information) where classification error rates approach 50%. According to

Good–Turing estimation, there exists a lower bound on hallucinations equal to the proportion of facts that appear only once (singletons). While RLHF reduces some hallucinations, most evaluations use binary correct/incorrect systems (accuracy), making guesses score higher than IDK. This optimizes models to always provide overconfident answers in “test-taking regime".

Rather than developing new hallucination evaluations, we should modify mainstream benchmarks (MMLU, GPQA, SWE-bench, etc.) to avoid penalizing IDK or uncertainty expressions. Alternatively, problem instructions could include explicit confidence thresholds (t=0.5, 0.75, etc.) and wrong answer penalties, encouraging behavioral calibration where models only answer when they have sufficient confidence.

Why language models hallucinate

OpenAI’s new research explains why language models hallucinate. The findings show how improved evaluations can enhance AI reliability, honesty, and safety.

https://openai.com/index/why-language-models-hallucinate/

cdn.openai.com

https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf

Transformers assign meaning even when inputs are meaningless. The model has an

Inductive Bias that forcibly projects semantic structure regardless of input. The more uncertain the input, the more concepts are activated(especially in middle layers); the more ambiguous the information, the more semantic structure the model creates on its own (conceptual wandering). Concept activation vectors are extracted from input documents through SAE → hallucination scores are predicted using 4-component

PLS Regression. SAE-based predictions show higher accuracy in detecting hallucinations compared to non-SAE approaches. Using LLM-based HHEM for LLM hallucination detection models may introduce inductive bias. (

AI Knowledge Conflict)

multicollinearity

arxiv.org

https://arxiv.org/pdf/2509.06938

AI Hallucination

Uncertainty of AI (
Epsilon Greedy,
Creativity,
Extrapolation)

Triggering Prompts

The Internal State of an LLM Knows When It’s Lying

Backlinks

Recommendations

AI Hallucination

Uncertainty of AI (Epsilon Greedy, Creativity, Extrapolation)

Triggering Prompts

The Internal State of an LLM Knows When It’s Lying

Backlinks

Recommendations

Uncertainty of AI (
Epsilon Greedy,
Creativity,
Extrapolation)