AI Knowledge Conflict

AI Knowledge Conflict Steering

Pixels Versus Priors

CSC patching

AI Knowledge Conflict Benchmarks

NQSwap

Macnoise

Visual CounterFact

Ambiguity

aclanthology.org

https://aclanthology.org/2024.mrl-1.26.pdf

SAE based steering to prevent knowledge conflict

arxiv.org

https://arxiv.org/pdf/2410.15999

The model's judgment is determined by certain attention heads. In multimodal models, later-stage attention tends toward image information (non-commonsense), while MLPs lean toward commonsense knowledge. The influence of visual information is localized (at the image patch level) and can be manipulated. This means that by manipulating attention, we can change the decision direction of Vision-Language Models (VLMs).

arxiv.org

https://arxiv.org/pdf/2507.13868