AI Knowledge Conflict

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Nov 20 0:12
Editor
Edited
Edited
2025 Aug 4 21:30
AI Knowledge Conflict Steering
 
 
 
AI Knowledge Conflict Benchmarks
 
 
 
 
Ambiguity
SAE based steering
The model's judgment is determined by certain attention heads. In multimodal models, later-stage attention tends toward image information (non-commonsense), while MLPs lean toward commonsense knowledge. The influence of visual information is localized (at the image patch level) and can be manipulated. This means that by manipulating attention, we can change the decision direction of Vision-Language Models (VLMs).
 
 
 

Recommendations