Exploring SAE features in LLMs with definition trees and token lists — LessWrong
TL;DR A software tool is presented which includes two separate methods to assist in the interpretation of SAE features. Both use a "feature vector" b…
https://www.lesswrong.com/posts/w35H4ski8cHMpnWgX/exploring-sae-features-in-llms-with-definition-trees-and