Semantics-Adaptive Dynamic Intervention
Generates dynamic steering vectors that reflect different semantics for each input
- Calculate activation differences between contrastive pairs (positive vs negative) → identify important components (attention heads, hidden states, neurons).
- During inference, apply element-wise scaling according to the input's activation → results in semantically appropriate intervention (direction).
Simply element-wise masking with contrastive dataset like CAA

Seonglae Cho