ICL Saturation Nature

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 1 15:59
Editor
Edited
Edited
2026 Jan 1 16:3
Refs
Refs
ICL sigmoid: as the number of in-context examples increases, model behavior follows an S-shaped curve: minimal change → sudden sharp transition → saturation. Jailbreak threshold: the inflection point in the middle of this sigmoid curve. When examples accumulate to around this point, the model's internal beliefs cross a critical threshold, causing it to suddenly exhibit risky/prohibited behaviors it previously avoided. Near the threshold, just 1–2 additional examples or weak activation steering can trigger a sharp behavioral flip between safe ↔ violation modes.
 
 
 
 
 
In-context learning
and Activation Steering are fundamentally the same mechanism. They both modify the model's
Belief State
about latent concepts. ICL accumulates evidence (likelihood) through contextual examples, while Steering directly shifts the prior probability. The two combine additively in log-belief space → small adjustments can cause sharp behavioral phase shifts. This perspective enables prediction and explanation of phenomena like many-shot ICL's sigmoid learning curve and jailbreak thresholds.
arxiv.org
Many-shot jailbreaking observes sharp transition
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language...
Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to...
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language...
many-shot ICL reports phase-transition-like behavior
Many-Shot In-Context Learning
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context...
Many-Shot In-Context Learning
 

Recommendations