'Affine Concept Editing (ACE)' identifies racial and gender directions in the model activation space and moves them to neutral values during inference, reducing bias to less than 2.5% while limiting general performance loss on benchmarks like MMLU to 0.1-3.7%. It's called 'affine' because it only removes the movement component to the neutral point after projection onto the concept direction.
Affine Concept Editing
Creator
Creator

Created
Created
2025 Jun 15 18:7Editor
Editor

Edited
Edited
2025 Jun 15 18:11Refs