Internal Interface Theory

Created
Created
2024 Oct 24 23:8
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Nov 21 20:58
Refs
Refs
The Internal Interfaces Theory suggests that in any complex AI system with multiple specialized modules
 
 
 
 
 
 

Inter-layer Communication in
Reversing Transformer

Models write into low-rank subspaces of the residual stream to represent features which are then read out by specific later layers, forming low-rank communication channels between layers.
The Inhibition Head influences the Mover Head in subsequent layers, guiding the model to reduce attention on irrelevant items when selecting the item to recall. This mechanism enables the model to suppress unnecessary elements and highlight the relevant ones during simple recall tasks. Therefore, The study found that the model autonomously learns structures that suppress or move specific items during the training process. This results contribute positive evidence that Intricate content-independent structure emerges as a result of self-supervised pretraining.
 
 

Recommendations