TFA

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 3 22:33
Editor
Edited
Edited
2026 Jan 9 23:43
Refs
Refs

Temporal Feature Analysis

  • (predictable / slow-moving / context) the predictable component from past context
  • (novel / fast-moving / residual) new information (residual) not explained by context
TFA creates a direction that explains the current using past activations , implementing this in an attention form such as
NEPA
. Novel component: "apply SAE to the residual"
 
 
 
SAEs assume concepts are independent and stationary over time, but actual LM activations exhibit strong temporal correlations and non-stationarity. SAE's temporal independence and fixed sparsity assumptions lead to bottlenecks such as
SAE Feature Splitting
.
Temporal Feature Analysis (TFA) decomposes activations into predictable (slow, contextual) components and novel (fast, residual) components. It outperforms SAE in garden-path sentence parsing, event boundary detection, and capturing long-range structure. In other words, interpretability tools require
Inductive Bias
aligned with the temporal structure of the data.

Token sequence

 
 
 

Recommendations