SpaDE

Creator
Creator
Seonglae Cho
Created
Created
2025 Mar 8 12:38
Editor
Edited
Edited
2025 Mar 11 11:31

Sparsemax Distance Encoder

Abstract

SAE aims to solve the
Bilevel optimization
problem where the outer optimization minimizes reconstruction error and sparsity regularization, while the inner optimization finds optimal projection values for the encoder given a constraint set (e.g., simplex). SAE architectures implicitly assume specific data structures (linear separability, fixed sparsity (fixed dimensionality), etc.), leading to different concept captures. When unable to reflect the actual data structure (nonlinear separability, concept
Heterogeneity
, etc.), SAE may miss important concepts. SpaDE is designed to explicitly reflect each concept's inherent dimension and nonlinear separability using probability simplex (projection set), enabling variable dimensionality through
Sparsemax Function
proxy and generating more interpretable and specialized (Latent Unit (feature) Prototype) than existing SAE. However, the Euclidean distance-based assumption (measuring distance between input vectors and prototypes using Euclidean distance) may not be suitable for all data types, and excessive specialization may limit generalization.

Preliminaries

The Duality Between SAEs Architectures and Their Implicit Data Assumptions
The Duality Between SAEs Architectures and Their Implicit Data Assumptions
notion image
SAE encoders project it onto an architecture-specific constraint set. This projection fundamentally determines which features an SAE can extract and which it will suppress.
notion image

Receptive field

They defined SAE’s
Receptive field
, a popularly used concept in neuroscience
Fk={xRdf(k)(x)>0}\mathcal{F}_k = \{x \in \mathbb{R}^d | f^{(k)} (x) > 0\}
Intuitively, Fk\mathcal{F}_krepresents the region of input space where neuron kk is active. For projection based encoders, the receptive field can be rewritten as:
Fk=f1(S{zk>0}),\mathcal{F}_k = f^{-1}(S \cap \{z_k > 0\}),
where S is the projection set of the encoder. That is Fk\mathcal{F_k} is the pre-image of the intersection of the e projection set with the half-space {zk>0}\{z_k>0\}. Alternatively, it can be viewed as the complement of the pre-image of the set S{zk=0}S\cap \{z_k = 0\}, where the hyperplane zk=0z_k=0 indicates latent k is “dead”. This implies the structure of receptive fields in an SAE is dictated by its encoder’s architecture.

Duality

Fundamental duality between how concepts are organized in model representations versus how an SAE encoder’s receptive fields should be structured to optimally identify said concepts. Crucially, this implies any SAE is implicitly biased towards identifying concepts that are organized in a specific manner.

SAE Assumptions

TopK assumes separation by solid angle, i.e., concepts belong to non-overlapping hyperpyramids (unbounded convex polytopes with only one corner at the origin and flat faces, i.e., they are unbounded hyperpyramids) in high dimensions.
  • ReLU SAE
    : captures heterogeneity due to the adaptive sparsity
  • Top-K SAE
    : based on ranking and relative magnitudes within vectors rather than absolute values of components, making it angular separable.
  • JumpReLU: captures heterogeneity due to the adaptive sparsity
notion image
notion image

Concept heterogeneity

For concept heterogeneity, SAEs must demonstrate adaptive sparsity in their latent representations. i.e. different concepts must be able to activate different numbers of latents. For projection nonlinearities, this implies that the projection set S must admit points with varying levels of sparsity.
notion image

SpaDE Method

Due to the use of euclidean distances in choosing active indices, the receptive field is a union of convex polytopes in the vicinity of the prototype aka_k of latent kk. This incorporates the notion of locality and flexibility in receptive field shapes, allowing latents to capture nonlinearly separable concepts
The outer optimization for SpaDE is a locality-enforced version of dictionary learning called K-Deep Simplex (KDS). The distance-weighted L1 regularizer encourages each datapoint to use those dictionary atoms which are close to itself in euclidean distance, inducing a soft clustering bias.
Note how SpaDE satisfies the two data properties of nonlinear separability and heterogeneity:
The projection set S in SpaDE is the probability simplex, which admits edges/corners with varying levels of sparsity, thereby allowing the representation of heterogeneous concepts.
The receptive fields of SpaDE are local to each prototype (encoder weight vector), and are flexibly defined as the union of convex polytopes. This allows latents in SpaDE to become monosemantic to concepts which are nonlinearly separable from the rest of the data.

Separability Experiment

  • Setup: Using 2D Gaussian cluster data, configured so some clusters are linearly separable while others are nonlinearly separated.
  • Observations:
    • ReLU and JumpReLU capture linearly separable concepts well but show low F1 scores for nonlinear separation concepts.
    • TopK shows some flexibility but still has limitations.
    • SpaDE achieves nearly 100% F1 scores in both cases, maintaining clear concept distinction and locality.

Heterogeneity Experiment

  • Setup: Generated Gaussian clusters with different intrinsic dimensions (6, 14, 30, 62, 126 dimensions) in 128-dimensional space.
  • Observations:
    • ReLU and JumpReLU adapt somewhat but fail to fully reflect each concept's intrinsic dimension.
    • TopK fails to effectively reproduce high-dimensional concepts due to fixed sparsity.
    • SpaDE flexibly adjusts sparsity (number of active neurons) per concept, showing near-ideal reproduction performance.

Formal Language Experiment

  • Setup: Training SAE on intermediate activations of a Transformer model using text data generated with
    PCFG
    .
  • Observations:
    • Clear clustering and separation phenomena observed between various Parts-of-Speech.
    • SpaDE particularly excels at capturing mono-semantic characteristics, generating more interpretable representations than other SAEs.

Vision Experiment

  • Setup: Training SAE on image tokens extracted from the Imagenette dataset using DINOv2 model.
  • Observations:
    • SpaDE effectively separates interpretable visual concepts like object foreground/background and detailed parts, showing high F1 scores and distinct activation patterns for each class.
 
 
 
 
 
 
 
 

Recommendations