SADI

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Dec 4 1:3
Editor
Edited
Edited
2025 Dec 4 1:6
Refs
Refs

Semantics-Adaptive Dynamic Intervention

Generates dynamic steering vectors that reflect different semantics for each input
  1. Calculate activation differences between contrastive pairs (positive vs negative) → identify important components (attention heads, hidden states, neurons).
  1. During inference, apply element-wise scaling according to the input's activation → results in semantically appropriate intervention (direction).
Simply element-wise masking with contrastive dataset like
CAA
 
 
 
 
 

Recommendations