CAST (Conditional Activation Steering) activation-steeringIBM • Updated 2026 Feb 15 19:35
activation-steering
IBM • Updated 2026 Feb 15 19:35
Compare Alpaca Dataset / Sorry Bench
- AI Condition Vector (extract to prompt)
- Refusal vector (apply to response)

arxiv.org
https://arxiv.org/pdf/2409.05907

Seonglae Cho