distributed alignment search
LearnOrthogonal Matrix of activation layer to transform activation layer. They use interchange intervention to infer high-level causal abstraction to optimize alignment. It more focuses on distributed representation rather than Neuron SAE trying to decompose each into features mono-semantically.
They rotate basis of activation vector to identify high-level causal variable but there is a limit due to the Superposition Hypothesis with same-sized dimension.