IDTA

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 May 11 17:26
Editor
Edited
Edited
2026 Jan 9 15:56
Refs
Refs

Inference-Time Decomposition of Activations

Due to the
Linear Representation Hypothesis
, a greedy algorithm is possible. It's applied with
Matching pursuit
from the compressed sensing algorithm field and has much room for improvement. While it could be interesting if calculations become efficient for large models or cross-model scenarios, looking at the appendix shows there aren't many good features relative to the model size.
Maintains over 90% reconstruction performance compared to SAE, while achieving similar performance in automated interpretation and linear probing tasks with 100-1000x faster learning speed. Offers an alternative to SAE for large-scale and repetitive analysis tasks that would otherwise be computationally burdensome.
 
 
 
ICML Poster Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
We introduce Inference-Time Decomposition of Activations (ITDA) models, which can be trained in just 0.1-1\% of the time required for SAEs, using only 0.1-1\% of the data. Despite this, ITDAs achieve at least 90\% of the reconstruction performance of SAEs and deliver comparable results on interpretability benchmarks
 
 

Recommendations