Affine Concept Editing

Creator

Seonglae Cho

Created

2025 Jun 15 18:7

Editor

Seonglae Cho

Edited

2025 Jun 15 18:11

Refs

Affine Transformation

'Affine Concept Editing (ACE)' identifies racial and gender directions in the model activation space and moves them to neutral values during inference, reducing bias to less than 2.5% while limiting general performance loss on benchmarks like MMLU to 0.1-3.7%. It's called 'affine' because it only removes the movement component to the neutral point after projection onto the concept direction.

arxiv.org

https://arxiv.org/pdf/2506.10922

Recommendations

////////