Vertical strong line to System Prompt or BOS Token (>50%) in Attention Score
LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers and ultimately focusing on critical tokens.
Attention Sinks Notion
Attention sink
Null attention
Massive Activation
Special token, delimiter, conjunction, preposition, first token, number token, weak semantics