Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Machine Unlearning/SAE Unlearning/
DSG
Search

DSG

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 May 10 22:20
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Aug 14 20:16
Refs
Refs
SPARE
Fisher Information Matrix

Dynamic SAE Guardrails

It's not dynamic but rather a type of
Conditional Vector Steering
 
 
 
 
Gradient
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails...
Machine unlearning is a promising approach to improve LLM safety by removing unwanted knowledge from the model. However, prevailing gradient-based unlearning methods suffer from issues such as high...
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails...
https://openreview.net/forum?id=8gFO7ebDLT
arxiv.org
https://arxiv.org/pdf/2504.08192v1
SAE DSG (Dynamic SAE guardrail)
arxiv.org
https://arxiv.org/pdf/2504.08192
 

 

Backlinks

SAE Steering

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Machine Unlearning/SAE Unlearning/
DSG
Copyright Seonglae Cho