Repeated Token Phenomenon

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Aug 1 16:33
Editor
Edited
Edited
2025 Aug 1 16:38
Refs
Refs
 
 
 
 
 
The explanation that token repetition cycles causing divergence is related to the
Attention Sink
. The first layer fails to distinguish between the first token and repeated identical tokens, incorrectly marking repeated tokens as sinks, which creates abnormal attention and leads to divergence. As the repetition length n increases, the representation of the last repeated token converges to the representation of a single token sequence. Cluster attack: Even without exact repetition, it's possible to induce the same collapse by repeatedly placing similar token sets (clusters) that trigger the same attention heads
 
 

Recommendations