Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/Gated SAE/
JumpReLU SAE
Search

JumpReLU SAE

Creator
Creator
Seonglae Cho
Created
Created
2025 Mar 17 11:6
Editor
Editor
Seonglae Cho
Edited
Edited
2025 May 20 14:43
Refs
Refs

Generalized Gated SAE by learning threshold by zeroing out pre-activations

+ L0 loss

JumpReLU SAE with
Unit step function

Does this mean they efficiently implemented the gating mechanism using JumpReLU activation?
notion image
notion image
 
 
google jumprelu preliminary
arxiv.org
https://arxiv.org/pdf/2404.16014
gemma scope jumprelu
arxiv.org
https://arxiv.org/pdf/2407.14435
differentiable pre-act-loss
Circuits Updates - January 2025
We report a number of developing ideas on the Anthropic interpretability team, which might be of interest to researchers working actively in this space. Some of these are emerging strands of research where we expect to publish more on in the coming months. Others are minor points we wish to share, since we're unlikely to ever write a paper about them.
Circuits Updates - January 2025
https://transformer-circuits.pub/2025/january-update/index.html
 
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/Gated SAE/
JumpReLU SAE
Copyright Seonglae Cho