Token Entropy

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 27 15:8
Editor
Edited
Edited
2025 Aug 27 0:36
Refs
Refs
 
 
 
 
 

Token Entropy (
Qwen
)

represents the flatness of the next token selection probability distribution and indicates whether that point is a reasoning branch point or not. In Chain of Thought (CoT), 80% of generated tokens had low entropy while 20% had high `entropy. LVR training largely preserves the token entropy patterns of the base model, mainly adjusting only the high-entropy tokens, suggesting that controlling branch points is sufficient for getting correct answers. This was experimentally proven as updating policy gradients using only the top 20% high-entropy tokens maintained or improved reasoning performance compared to using all tokens.
Used by adding to the advantage of all tokens. Shares the observation that forking tokens represent reasoning branch points
DeepConf calculates group confidence by bundling tokens into window-sized groups (e.g., recent 2k tokens) rather than individual tokens. It discards low-confidence traces and only votes with high-confidence traces
 
 

 

Recommendations