Activation Density

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jan 26 22:9
Editor
Edited
Edited
2025 Aug 9 1:2
Refs
Refs
fraction of nonzero activation
 
 
 
 
 
 
High Activation Density could mean either that sparsity was not properly learned, or that it is an important feature needed in various situations. In the Feature Browser, SAE features show higher feature interpretability when they have more high activation
Quantile
, which demonstrates a limitation where SAE features have low interpretability for low activations and exhibit certain skewness.
However, features with the highest
Activation Density
in the
Activation Distribution
are less interpretable, mainly because these features typically don't have high activation values in absolute terms (not quantile). A well-classified and highly interpretable SAE feature should not show density that simply decreases with activation value, but rather should show clustering at high activation levels after an initial decrease.
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole. By understanding the function of each component, and how they interact, we hope to be able to reason about the behavior of the entire network. The first step in that program is to identify the correct components to analyze.
for quantization
Activation Density based Mixed-Precision Quantization for Energy...
As neural networks gain widespread adoption in embedded devices, there is a need for model compression techniques to facilitate deployment in resource-constrained environments. Quantization is one...
Activation Density based Mixed-Precision Quantization for Energy...
 
 

Recommendations