Activation Sparsity

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Oct 5 22:38
Editor
Edited
Edited
2025 May 19 15:30
Refs
Refs
Activation 이 sparse한 정도로 높을수록 불필요한 계산을 줄이고 모델의 효율성을 높이는 데 도움을 준다.
Superposition Hypothesis
에 의해 이론이 뒷받침되며 높은 activation sparsity는
Mechanistic interpretability
에도 도움을 준다. (
Sparse Autoencoder
로 분리하기 쉽다)
 
 
 
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity...
Activation sparsity refers to the existence of considerable weakly-contributed elements among activation outputs. As a prevalent property of the models using the ReLU activation function,...
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity...
web.stanford.edu

CATS with threshold activation

When activation vector is sparse such as from ReLU, computation can be optimized. However, modern architectures like SwiGLU mostly have nonzero values, they contain many epsilon values, so removing them with a threshold makes computation efficient.
arxiv.org
 
 

Recommendations