CAST (Conditional Activation Steering) activation-steeringIBM • Updated 2025 Sep 3 22:10
activation-steering
IBM • Updated 2025 Sep 3 22:10
Compare Alpaca Dataset / Sorry Bench
- AI Condition Vector (extract to prompt)
- Refusal vector (apply to response)
