SPARE

Creator

Creator

Seonglae Cho

Created

Created

2024 Oct 26 16:5

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Aug 14 0:38

Refs

Refs

SAE based
AI Knowledge Conflict resolve

Detected Knowledge conflict by AUROC jump at Logistic Regression of DEC vs DEM

Mutual information

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

Yu Zhao, Alessio Devoto, Giwon Hong, Xiaotang Du, Aryo Pradipta Gema, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

https://aclanthology.org/2025.naacl-long.264/

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

Recommendations

//////////////