Contrastive activation additionsAdd the average activation diff to all tokens using simple ActAdd LLaMa 2 CAAnrimsky • Updated 2025 Mar 5 7:23Steering Llama-2 with contrastive activation additions — LessWrongThe effects of subtracting or adding a "sycophancy vector" to one bias term. TL;DR: By just adding e.g. a "sycophancy vector" to one bias term, we o…https://www.lesswrong.com/posts/v7f8ayBxLhmMFRzpa/steering-llama-2-with-contrastive-activation-additionsarxiv.orghttps://arxiv.org/pdf/2312.06681