Sparse Feature Circuit

SFC

SHIFT (Sparse Human-Interpretable Feature Trimmin)

Task Vector Cleaning
AI Task vector

The Task Vector Cleaning (TVC) algorithm refines "task vectors" into a small number of SAE features to extract core 'execution features' essential for task performance. The Sparse Feature Circuits (SFC) technique, extended and modified for the Gemma-1 2B model, identifies "task detection features" separate from execution features and reveals the causal connections between these two layers (detection→execution). This experimentally proves that in-context learning occurs in two stages: "detecting which task to perform"→"actual execution", primarily taking place in the middle layers' MLP and attention sublayers.

While

Induction head covers the general pattern matching capabilities of

In-context learning, this paper focuses on

Instruction Dataset to explain the causal relationship between task detection and instruction following execution.

www.arxiv.org

https://www.arxiv.org/pdf/2504.13756