Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Automated Interpretability/
Closest Token Lists
Search

Closest Token Lists

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Dec 28 16:30
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Jul 1 15:49
Refs
Refs
Creating Definition Trees with Ghost Tokens similar to SAE features to extract related token lists
  • Limitation: Token Lists approach ignores context
  • Advantage: Can create automated interpretability without external LLMs
 
 
 
 
Exploring SAE features in LLMs with definition trees and token lists — LessWrong
TL;DR A software tool is presented which includes two separate methods to assist in the interpretation of SAE features. Both use a "feature vector" b…
Exploring SAE features in LLMs with definition trees and token lists — LessWrong
https://www.lesswrong.com/posts/w35H4ski8cHMpnWgX/exploring-sae-features-in-llms-with-definition-trees-and
Exploring SAE features in LLMs with definition trees and token lists — LessWrong
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Automated Interpretability/
Closest Token Lists
Copyright Seonglae Cho