CAST (Conditional Activation Steering) activation-steeringIBM • Updated 2025 Jun 26 23:43
activation-steering
IBM • Updated 2025 Jun 26 23:43
Compare Alpaca Dataset / Sorry Bench
- AI Condition Vector (extract to prompt)
- Refusal vector (apply to response)
