Reasoning Feature

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Apr 6 19:11
Editor
Edited
Edited
2025 Apr 6 19:54
Refs
 
 
 
 
 

ReasonScore

ReasonScore is used to identify the SAE features responsible for reasoning automatically on DeepSeek-R1-Llama-8B. The process involves extracting a set of reasoning-related words R from the dataset, filtering them based on frequency using external corpora (e.g., Google Books Ngram Corpus), and manually selecting 10 words by researchers. This feature focuses on relevant token frequencies, where higher values indicate stronger emphasis on reasoning-related behaviors.
 
 

Recommendations