Token Entropy

Token Entropy (
Qwen)

represents the flatness of the next token selection probability distribution and indicates whether that point is a reasoning branch point or not. In Chain of Thought (CoT), 80% of generated tokens had low entropy while 20% had high `entropy. LVR training largely preserves the token entropy patterns of the base model, mainly adjusting only the high-entropy tokens, suggesting that controlling branch points is sufficient for getting correct answers. This was experimentally proven as updating policy gradients using only the top 20% high-entropy tokens maintained or improved reasoning performance compared to using all tokens.

www.arxiv.org

https://www.arxiv.org/pdf/2506.01939

Used by adding to the advantage of all tokens. Shares the observation that forking tokens represent reasoning branch points

www.arxiv.org

https://www.arxiv.org/pdf/2506.14758

DeepConf calculates group confidence by bundling tokens into window-sized groups (e.g., recent 2k tokens) rather than individual tokens. It discards low-confidence traces and only votes with high-confidence traces

jiaweizzhao.github.io

https://jiaweizzhao.github.io/deepconf/static/pdfs/deepconf_arxiv.pdf

Deep Think with Confidence

Deep Think with Confidence (DeepConf): A simple yet powerful method that significantly improves both reasoning efficiency and performance at test time.

https://jiaweizzhao.github.io/deepconf/

Token Entropy

Token Entropy (
Qwen)

Backlinks

Recommendations

Token Entropy

Token Entropy (Qwen)

Backlinks

Recommendations

Token Entropy (
Qwen)