AI Knowledge Conflict Steering
AI Knowledge Conflict Benchmarks
Ambiguity
SAE based steering
The model's judgment is determined by certain attention heads. In multimodal models, later-stage attention tends toward image information (non-commonsense), while MLPs lean toward commonsense knowledge. The influence of visual information is localized (at the image patch level) and can be manipulated. This means that by manipulating attention, we can change the decision direction of Vision-Language Models (VLMs).