The Internal Interfaces Theory suggests that in any complex AI system with multiple specialized modules
Inter-layer Communication in Reversing Transformer
Models write into low-rank subspaces of the residual stream to represent features which are then read out by specific later layers, forming low-rank communication channels between layers.
The Inhibition Head influences the Mover Head in subsequent layers, guiding the model to reduce attention on irrelevant items when selecting the item to recall. This mechanism enables the model to suppress unnecessary elements and highlight the relevant ones during simple recall tasks. Therefore, The study found that the model autonomously learns structures that suppress or move specific items during the training process. This results contribute positive evidence that Intricate content-independent structure emerges as a result of self-supervised pretraining.