'Affine Concept Editing (ACE)' identifies racial and gender directions in the model activation space and moves them to neutral values during inference, reducing bias to less than 2.5% while limiting general performance loss on benchmarks like MMLU to 0.1-3.7%. It's called 'affine' because it only removes the movement component to the neutral point after projection onto the concept direction.
Affine Concept Editing
Creator
Creator
Seonglae ChoCreated
Created
2025 Jun 15 18:7Editor
Editor
Seonglae ChoEdited
Edited
2025 Jun 15 18:11Refs
