Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Large language models (LLMs) have shown remarkable capability to memorise factual knowledge and solve knowledge-intensive tasks (Petroni et al., 2019; Brown, 2020; Touvron et al., 2023; Jiang et al., 2023; Anil et al., 2023).
Nevertheless, the knowledge stored in their parameters (parametric knowledge) can be inaccurate or outdated (Xu et al., 2024).
To alleviate this issue, retrieval and tool-augmented approaches have been widely adopted to provide LLMs with external knowledge (contextual knowledge) (Karpukhin et al., 2020; Lewis et al., 2020; Wu et al., 2022; Schick et al., 2024).
However, contextual knowledge may sometimes conflict with the parametric knowledge of the model, leading to what we refer to as knowledge conflicts.
Such conflicts can cause undesired behaviour, where the model may rely on inaccurate information sources, resulting in incorrect outputs (Mallen et al., 2023; Xie et al., 2024a; Su et al., 2024; Wang et al., 2023; Zhao et al., 2024).
https://arxiv.org/html/2410.15999v1