- A FFN subvalue can help increase probabilities of tokens with largest logits.
- it can reduce probabilities of tokens with smallest logits.
- It can distinguish two tokens with different logits
There are tens of thousands of token pairs in a FFN subvalue, so one FFN subvalue can distinguish many tokens. Last, a FFN subvalue can be a “query” to activate other FFN subvalues.