Nonlinear SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 3 12:50
Editor
Edited
Edited
2025 Feb 3 12:52
Refs
Refs
 
 
 
 
 
 

ProLU SAE

Rather than negating the linearity rationale inherent in SAE interpretability itself, it strengthens linearity through nonlinear activation functions. Specifically, unlike ReLU, ProLU's derivative with respect to bias is undefined, so to resolve this, composite differentiation is introduced and the bias is designed to only perform noise suppression instead of simply translating the input in parallel
 
 
 

Recommendations