The dilemma that if we allow the assumption that "representations can be encoded non-linearly," Causal abstraction itself becomes meaningless as an interpretive tool. Without additional assumptions like the Linear Representation Hypothesis (which enforces linear alignment), mechanistic interpretability cannot be guaranteed.
When generalizing the Causal Abstraction concept by removing linear constraints on the alignment map and allowing arbitrary non-linear functions, it theoretically shows that any DNN can perfectly match any algorithm, making it impossible to identify "which algorithm the model actually implements." Using non-linear with limited complexity can provide meaningful abstractions. However, the key point is that it only becomes vacuous (meaningless) when "any φ" is allowed. For non-linear to have interpretability, one must specify "which non-linear family to choose and why there are grounds to believe in it."
Non‑linear Representation Dilemma
Causal abstraction uses an alignment map to connect model hidden states ↔ intermediate variables of an algorithm. However, the definition itself does not restrict the map to be linear. It is shown that if the map is made sufficiently powerful, almost any model can be aligned with almost any algorithm through 'intervention-consistency' (including existence/learning experiments).

Seonglae Cho