Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/SAE Steering/
SPARE
Search

SPARE

Creator
Creator
Seonglae Cho
Created
Created
2024 Oct 26 16:5
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Mar 10 16:9
Refs
Refs

SAE based
AI Knowledge Conflict
resolve

Detected Knowledge conflict by AUROC jump at Logistic Regression of DEC vs DEM
 
 
 
Mutual information
arxiv.org
https://arxiv.org/pdf/2410.15999
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/SAE Steering/
SPARE
Copyright Seonglae Cho