PvP
The model tests how visual information and prior knowledge are handled through early decoding: observing a 'flip' phenomenon where predictions initially rely on prior knowledge but are later reversed by visual information in middle and late layers. Pixels Versus Priors controls whether the model relies more on visual input or prior knowledge by manipulating activation vectors through addition.
Fundamentally, this is not different from CAA and ActAdd, except that it applies these concepts to multimodal and knowledge conflict scenarios, along with its unique dataset.