ProLU SAE
Rather than negating the linearity rationale inherent in SAE interpretability itself, it strengthens linearity through nonlinear activation functions. Specifically, unlike ReLU, ProLU's derivative with respect to bias is undefined, so to resolve this, composite differentiation is introduced and the bias is designed to only perform noise suppression instead of simply translating the input in parallel