Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/SAE Steering/
SPARE
Search

SPARE

Creator
Creator
Seonglae Cho
Created
Created
2024 Oct 26 16:5
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Aug 10 21:47
Refs
Refs

SAE based
AI Knowledge Conflict
resolve

Detected Knowledge conflict by AUROC jump at Logistic Regression of DEC vs DEM
 
 
 
 
 
Mutual information
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao, Alessio Devoto, Giwon Hong, Xiaotang Du, Aryo Pradipta Gema, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
https://aclanthology.org/2025.naacl-long.264/
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/SAE Steering/
SPARE
Copyright Seonglae Cho