The explanation that token repetition cycles causing divergence is related to the Attention Sink. The first layer fails to distinguish between the first token and repeated identical tokens, incorrectly marking repeated tokens as sinks, which creates abnormal attention and leads to divergence. As the repetition length n increases, the representation of the last repeated token converges to the representation of a single token sequence. Cluster attack: Even without exact repetition, it's possible to induce the same collapse by repeatedly placing similar token sets (clusters) that trigger the same attention heads
Repeated Token Phenomenon
Creator
Creator

Created
Created
2025 Aug 1 16:33Editor
Editor

Edited
Edited
2025 Aug 1 16:38Refs
Refs