CAST (Conditional Activation Steering) activation-steeringIBM • Updated 2025 Oct 30 12:7
activation-steering
IBM • Updated 2025 Oct 30 12:7
Compare Alpaca Dataset / Sorry Bench
- AI Condition Vector (extract to prompt)
- Refusal vector (apply to response)


Seonglae Cho