Affine Concept Editing

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jun 15 18:7
Editor
Edited
Edited
2025 Jun 15 18:11
'Affine Concept Editing (ACE)' identifies racial and gender directions in the model activation space and moves them to neutral values during inference, reducing bias to less than 2.5% while limiting general performance loss on benchmarks like MMLU to 0.1-3.7%. It's called 'affine' because it only removes the movement component to the neutral point after projection onto the concept direction.
 
 
 
 
 
 
 

Recommendations