Uncertainty of AI (Epsilon Greedy, Creativity, Extrapolation)
For example, hallucinations in robotics models pose significant dangers, unlike language models' hallucinations, which merely provide incorrect information. Mechanistic interpretability provides a promising and explicit method to control AI.
LLMs hallucinate when fine-tuned with new factual knowledge, as they learn new information slower than consistent knowledge
AI Hallucination Notion
The Internal State of an LLM Knows When It’s Lying
Bigger AI chatbots more inclined to spew nonsense
Masking Retrieval Head or relevant Induction head could induce hallucinations