IDTA

Inference-Time Decomposition of Activations

Due to the

Linear Representation Hypothesis, a greedy algorithm is possible. It's applied with

Matching pursuit from the compressed sensing algorithm field and has much room for improvement. While it could be interesting if calculations become efficient for large models or cross-model scenarios, looking at the appendix shows there aren't many good features relative to the model size.

Maintains over 90% reconstruction performance compared to SAE, while achieving similar performance in automated interpretation and linear probing tasks with 100-1000x faster learning speed. Offers an alternative to SAE for large-scale and repetitive analysis tasks that would otherwise be computationally burdensome.

ICML Poster Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

We introduce Inference-Time Decomposition of Activations (ITDA) models, which can be trained in just 0.1-1\% of the time required for SAEs, using only 0.1-1\% of the data. Despite this, ITDAs achieve at least 90\% of the reconstruction performance of SAEs and deliver comparable results on interpretability benchmarks

https://icml.cc/virtual/2025/poster/46477#:~:text=We%20introduce%20Inference%2DTime%20Decomposition,comparable%20results%20on%20interpretability%20benchmarks

IDTA

Inference-Time Decomposition of Activations

Recommendations