Jacobian SAE

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Mar 17 23:28
Editor
Edited
Edited
2025 Apr 14 22:3

JSAE

two SAEs: one trained on the input activations and the other trained on the output activations.
notion image
simultaneously train two separate SAEs on the input and output
Here, k is the number of non-zero elements in the TopK activation function

Scalar function

Input: The j-th element of the input vector of SAE (a scalar value) Output: The i-th element of the SAE's output vector corresponding to that input change (a scalar value). This allows us to analyze whether the relationships between individual latents are linear or nonlinear, and verify if the corresponding element of the Jacobian accurately predicts these changes. And interestingly, scalar function is mostly linear.
 
 
 
 

Recommendations