Non‑linear Representation Dilemma

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 1 14:54
Editor
Edited
Edited
2025 Jul 18 0:1
Refs
Refs
The dilemma that if we allow the assumption that "representations can be encoded non-linearly," Causal Abstraction itself becomes meaningless as an interpretive tool. Without additional assumptions like the Linear Representation Hypothesis (which enforces linear alignment), mechanistic interpretability cannot be guaranteed. When generalizing the Causal Abstraction concept by removing linear constraints on the alignment map φ and allowing arbitrary non-linear functions, it theoretically shows that any DNN can perfectly match any algorithm, making it impossible to identify "which algorithm the model actually implements." Using non-linear φ with limited complexity can provide meaningful abstractions. However, the key point is that it only becomes vacuous (meaningless) when "any φ" is allowed. For non-linear φ to have interpretability, one must specify "which non-linear family V to choose and why there are grounds to believe in it."
 
 
 
 
 

Recommendations