Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Activation Patching/
Direct logic attribution
Search

Direct logic attribution

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 1 22:9
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Feb 1 22:10
Refs
Refs
Looking at the direct contribution of the output of some component to the logit for the true next token.
Simple type of
Direct Path Patching
 
 
 
 
 
 
A Comprehensive Mechanistic Interpretability Explainer & Glossary - Dynalist
Dynalist lets you organize your ideas and tasks in simple lists. It's powerful, yet easy to use. Try the live demo now, no need to sign up.
A Comprehensive Mechanistic Interpretability Explainer & Glossary - Dynalist
https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=disz2gTx-jooAcR0a5r8e7LZ
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Activation Patching/
Direct logic attribution
Copyright Seonglae Cho