Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks
Hallucinations remain a significant challenge in current Generative AI models, undermining trust in AI systems and their reliability. This study investigates how orchestrating multiple specialized Artificial Intelligent Agents can help mitigate such hallucinations, with a focus on systems leveraging Natural Language Processing (NLP) to facilitate seamless agent interactions. To achieve this, we design a pipeline that introduces over three hundred prompts, purposefully crafted to induce hallucinations, into a front-end agent. The outputs are then systematically reviewed and refined by second- and third-level agents, each employing distinct large language models and tailored strategies to detect unverified claims, incorporate explicit disclaimers, and clarify speculative content.
Additionally, we introduce a set of novel Key Performance Indicators (KPIs) specifically designed to evaluate hallucination score levels. These metrics offer a structured and quantifiable framework for assessing the impact of each agent’s refinements on the factuality and clarity of AI-generated responses. A dedicated fourth-level AI agent is employed to evaluate these KPIs, providing detailed assessments and ensuring accurate quantification of shifts in hallucination-related behaviors.
A core component of this investigation is the use of the OVON (Open Voice Network) framework, which relies on universal NLP-based interfaces to transfer contextual information among agents. Through structured JSON messages, each agent communicates its assessment of the hallucination likelihood and the reasons underlying questionable content, thereby enabling the subsequent stage to refine the text without losing context. Experimental results suggest that this multi-agent, JSON-based approach not only lowers the overall hallucination scores but also renders speculative content more transparent and clearly demarcated from factual claims, improving the AI explainability level.
Our findings underscore the feasibility of multi-agent orchestration and highlight the importance of maintaining a structured exchange of meta-information - particularly through formats supporting Natural Language API - to enhance the reliability and interpretability of AI-generated responses.
The results demonstrate that employing multiple specialized agents capable of interoperating with each other through NLP-based agentic frameworks - such as the OVON framework - can yield promising outcomes in hallucination mitigation, ultimately bolstering trust within the AI community.
https://arxiv.org/html/2501.13946v1
Impressive examples that fix LLM reasoning errors
Monitor: An AI-Driven Observability Interface
This write-up is a technical demonstration, which describes and evaluates the use of a new piece of technology. For technical demonstrations, we still run systematic experiments to test our findings, but do not run detailed ablations and controls. The claims are ones that we have tested and stand behind, but have not vetted as thoroughly as in our research reports.
https://transluce.org/observability-interface

Demo
Transluce Monitor
https://monitor.transluce.org/dashboard/chat
Prior work has mainly focused on biases in attention maps (Attention Maps), but these reflect only forward-pass information and cannot accurately explain the actual propagation of influence between tokens. To bridge this gap, we propose LVLMs-Saliency, a diagnostic tool that combines attention weights and gradients to quantify the evidential strength of output tokens. We observe a pattern where the "saliency" score affecting next-token prediction drops sharply and contextual connectivity collapses; we define this with the formula . Here, denotes the attention matrix and $\mathcal{L}(x)$ the loss function; this metric numerically demonstrates that hallucinations begin when the model effectively "forgets" the prior context.
The first method is Saliency-Guided Rejection Sampling (SGRS), which rejects candidate tokens whose saliency falls below an adaptive threshold during decoding and resamples them, proactively preventing the injection of tokens that would break contextual coherence. The second method, Local Coherence Reinforcement (LocoRE), maintains local consistency by reinforcing attention weights from the current token to the most recently generated tokens using a gain factor such as .
Hallucination Begins Where Saliency Drops
Recent studies have examined attention dynamics in large vision-language models (LVLMs) to detect hallucinations. However, existing approaches remain limited in reliably distinguishing...
https://arxiv.org/abs/2601.20279


Seonglae Cho