EMA Probe

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Feb 10 18:0
Editor
Edited
Edited
2026 Feb 13 12:51
Refs
Refs
In long contexts, mean pooling misses situations where "malicious tokens briefly appear somewhere," so the idea is: "when a recent malicious signal spikes, EMA rises and we capture that max." First, train a standard linear mean probe, then during inference, accumulate those per-token scores using exponential moving average (EMA) and use the maximum value (max) at the end.
 
 
 
 
 
 
 
 
 

Recommendations