Gemini Probe

Created
Created
2026 Feb 10 17:56
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2026 Feb 13 12:51
Refs
Refs
 
 
 
 

Structure (aggregation) determines performance → MultiMax/Rolling/AlphaEvolve

Existing linear/EMA/mean probes have limitations in practice, → Should be replaced with new aggregation/architecture like MultiMax / Rolling Attn / AlphaEvolve. However, they can barely defend against adaptive attacks.
  • Best probe test error ≈ 2.5%
  • Jailbreak success rate (FNR) ≥ always remains at 1~3% or higher
  • More vulnerable to ART/Adaptive attacks
Call LLM only when ambiguous → Reduce cost to 1/50 level while maintaining performance
  • MultiMax → max pooling
  • Rolling Attn → local window + attn
  • AlphaEvolve → automatic structure search
arxiv.org
 
 
 

Recommendations