While Sparse Autoencoder (SAE) is based on the assumption that neural network internal representations can be decomposed 'linearly, orthogonally, and sparsely' (Linear Representation Hypothesis), growing research suggests the existence of hierarchical, non-linear, and multi-dimensional concepts. By applying Matching Pursuit to SAE, sparse codes are generated through iterative projection and removal of input and residuals, ensuring conditional orthogonality at each step. While traditional SAEs failed to distinguish parent-child concepts or only captured flat structures in hierarchical artificial data, MP-SAE accurately reconstructs both inter-level distinctions and intra-level correlations.
arxiv.org
https://arxiv.org/pdf/2506.03093

Seonglae Cho