Reasoning Feature

ReasonScore

ReasonScore is used to identify the SAE features responsible for reasoning automatically on DeepSeek-R1-Llama-8B. The process involves extracting a set of reasoning-related words R from the dataset, filtering them based on frequency using external corpora (e.g., Google Books Ngram Corpus), and manually selecting 10 words by researchers. This feature focuses on relevant token frequencies, where higher values indicate stronger emphasis on reasoning-related behaviors.

arxiv.org

https://arxiv.org/pdf/2503.18878

Reasoning Feature

ReasonScore

Recommendations