Model Unlearning via Sparse Autoencoder Subspace Guided Projections
Large language models (LLMs) store vast knowledge but pose privacy and safety risks when targeted content must be removed. Existing unlearning approaches such as gradient-based methods, model...
https://openreview.net/forum?id=MIlqM98o9I