Non‑linear Representation Dilemma

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 1 14:54
Editor
Edited
Edited
2025 Dec 23 23:8
The dilemma that if we allow the assumption that "representations can be encoded non-linearly,"
Causal abstraction
itself becomes meaningless as an interpretive tool. Without additional assumptions like the
Linear Representation Hypothesis
(which enforces linear alignment), mechanistic interpretability cannot be guaranteed.
When generalizing the Causal Abstraction concept by removing linear constraints on the alignment map and allowing arbitrary non-linear functions, it theoretically shows that any DNN can perfectly match any algorithm, making it impossible to identify "which algorithm the model actually implements." Using non-linear with limited complexity can provide meaningful abstractions. However, the key point is that it only becomes vacuous (meaningless) when "any φ" is allowed. For non-linear to have interpretability, one must specify "which non-linear family to choose and why there are grounds to believe in it."
 
 

Non‑linear Representation Dilemma

Causal abstraction uses an alignment map to connect model hidden states ↔ intermediate variables of an algorithm. However, the definition itself does not restrict the map to be linear. It is shown that if the map is made sufficiently powerful, almost any model can be aligned with almost any algorithm through 'intervention-consistency' (including existence/learning experiments).
 
 

Recommendations