DAS

Distributed alignment search

Learn

Orthogonal Matrix of activation layer to transform activation layer. They use interchange intervention to infer high-level causal abstraction to optimize alignment. It more focuses on distributed representation rather than

Sparse Autoencoder trying to decompose each into features mono-semantically.

They rotate basis of activation vector to identify high-level causal variable but there is a limit due to the

Superposition Hypothesis with same-sized dimension.

arxiv.org

https://arxiv.org/pdf/2303.02536

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is...

https://proceedings.mlr.press/v236/geiger24a.html

DAS

Distributed alignment search

Recommendations