In-context learning ability
During this Phase Change, the majority of in-context learning ability (as measured by difference in loss between tokens early and late in the sequence) is acquired, and simultaneously induction heads form within the model that are capable of implementing fairly abstract and fuzzy versions of pattern completion.

In-context learning score
The loss of the 500th token in the context minus the average loss of the 50th token in the context, averaged over dataset examples.
One might wonder if the sudden increase is somehow an artifact of the choice to define in-context learning in terms of the difference between the 500th and 50th tokens. An easy way to see that this is a robust phenomenon is to look at the derivative of loss with respect to the logarithm token index in context. You can think of this as measuring something like "in-context learning per ε% increase in context length." We can visualize this on a 2D plot, where one axis is the amount of training that has elapsed, the other is the token index being predicted. Before the phase change, loss largely stops improving around token 50, but after the phase change, loss continues to improve past that point.

Prefix matching score
Anthropic go through the attention heads of a model and and score them for whether they are induction heads (using a prefix matching score which measures their ability to perform the task we used to define Induction head
Induction head Properties
- Prefix matching: The head attends back to previous tokens that were followed by the current and/or recent tokens. That is, it attends to the token which induction would suggest comes next.
- Copying: The head’s output increases the logit corresponding to the attended-to token.


This already strongly suggests some connection between induction heads and in-context learning, but beyond just that, it appears this window is a pivotal point for the training process in general: whatever's occurring is visible as a bump on the training curve (figure above). It is in fact the only place in training where the loss is not convex (monotonically decreasing in slope). The more interesting connection is that this figure explains Deep double descent really well.
That might not sound significant, but the loss curve is averaging over many thousands of tokens. Many behaviors people find interesting in language models, such as the emergence of arithmetic, would be microscopic on the loss curve.
In-context learning scoring
- Few shot learning (micro perspective focusing on specific tasks)
- the loss at different token indices (macro perspective focusing on average correlates with tasks)

Per-Token Loss Analysis
To better understand how models evolve during training, we analyze what we call the "per-token loss vectors." The core idea traces back to a method, and more generally to the idea of "function spaces" in mathematics.
It allows Anthropic to summarize the main dimensions of variation in how several models' predictions vary over the course of training.

Anthropic also apply principal components analysis (PCA) to the per-token losses, which allows to summarize the main dimensions of variation in how several models' predictions vary over the course of training. This means that the number of principal components has increased. This appears when you want to better capture the complexity and diversity of information in the dataset. This can be interpreted as the model considering a wider range of features when choosing tokens. In other words, it means that the model considers more factors when evaluating the importance of a specific token.

If a sequence of tokens occurs multiple times, the model is better at predicting the sequence the second time it shows up. On the other hand, if a token is followed by a different token than it previously was, the post-phase-change model is worse at predicting it.

Anthropic broke the loss curve apart and look at the loss curves for individual tokens.

Gradient
The gradient descent used in the backpropagation operation when training the model and the matrix operation performed in the attention layer of the transformer in the language model during inference are mathematically similar.
Reverse engineering with Induction head in Multi-head Attention
Stanford CS25: V1 I Transformer Circuits, Induction Heads, In-Context Learning
"Neural network parameters can be thought of as compiled computer programs. Somehow, they encode sophisticated algorithms, capable of things no human knows how to write a computer program to do. Mechanistic interpretability seeks to reverse engineer neural networks into human understandable algorithms. Previous work has tended to focus on vision models; this talk will explore how we might reverse engineer transformer language models.
In particular, we'll focus on what we call ""induction head circuits"", a mechanism that appears to be significantly responsible for in-context learning. Using a pair of attention heads, these circuits allow models to repeat text from earlier in the context, translate text seen earlier, mimic functions from examples earlier in the context, and much more. The discovery of induction heads in the learning process appears to drive a sharp phase change, creating a bump in the loss curve, pivoting models learning trajectories, and greatly increasing their capacity for in-context learning, in the span of just a few hundred training steps."
Chris Olah is a co-founder of Anthropic, an AI company focused on the safety of large models, where he leads Anthropic's interpretability efforts. Previously, Chris led OpenAI's interpretability team, and was a researcher at Google Brain. Chris' work includes the Circuits project, his blog (especially his tutorial on LSTMs), the Distill journal, and DeepDream.
View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM
https://www.youtube.com/watch?v=pC4zRb_5noQ

In-context Learning and Induction Heads
As Transformer generative models continue to scale and gain increasing real world use , addressing their associated safety problems becomes increasingly important. Mechanistic interpretability – attempting to reverse engineer the detailed computations performed by the model – offers one possible avenue for addressing these safety issues. If we can understand the internal structures that cause Transformer models to produce the outputs they do, then we may be able to address current safety problems more systematically, as well as anticipating safety problems in future more powerful models. Note that mechanistic interpretability is a subset of the broader field of interpretability, which encompasses many different methods for explaining the outputs of a neural network. Mechanistic interpretability is distinguished by a specific focus on trying to systematically characterize the internal circuitry of a neural net.
https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html
Insight with pre training and attention mechanism
Are Emergent Abilities in Large Language Models just In-Context Learning?
Are Emergent Abilities in Large Language Models just In-Context Learning?
Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex...
https://arxiv.org/abs/2309.01809

Why Can GPT Learn In-Context? Language Models Implicitly Perform...
Large pretrained language models have shown surprising in-context learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen input without...
https://arxiv.org/abs/2212.10559

The paper argues that when a transformer block (consisting of attention layers and MLP) receives context, it transforms that context into an implicit low-rank (specifically rank-1) update to the weights of the first layer of the MLP. The token consumption process creates gradient-descent-like dynamics. However, since the experiments primarily focus on single blocks, first tokens, and linear setups, there's insufficient evidence to support the claim that "this is the major mechanism explaining ICL in actual large LLMs." While it could serve as indirect evidence for the Linear Representation Hypothesis, the persuasiveness is limited because the data itself is in a linear regression setting.
The autoregressive structure itself is the key factor rather than model size or prompts. Relies on ad-hoc state updates without explicit weight updates.
There is minimal cross-task representation sharing (low cross-task transfer); in-context representations formed in one task do not transfer to other tasks. This is because ICL learns in "temporary runtime state space" rather than model parameter space. Consequently, this suggests that ICL is a non-reusable, task-specific temporary adaptation.
ICL operates based on statistical token alignment and contextual memory rather than semantic understanding or linguistic comprehension of prompts, and this is a temporary (local) and task-specific form of learning. In other words, the core mechanism is "instantaneous state compression" rather than "generalizable internal representation." It coexists within a broad representation space based on the Superposition Hypothesis, learning each task through massive data learning, which is why improves performance
Stanford video
In-Context Learning & "Model Systems" Interpretability (Stanford lecture 3) - Ekdeep Singh Lubana
What counts as an explanation of how an LLM works?
In our last Stanford guest lecture, Ekdeep explains the different levels of analysis in interpretability, and outlines his neuro-inspired "model systems approach".
Plus, how in-context learning and many-shot jailbreaking are explained by LLM representations changing in-context (as a case study for that approach).
00:33 - What counts as an explanation?
04:47 - Levels of analysis & standard interpretability approaches
18:19 - The "model systems" approach to interp
(Case study on in-context learning)
23:36 - How LLM representations change in-context
44:10 - Modeling ICL with rational analysis
1:10:54 - Conclusion & questions
Read more about our research: https://www.goodfire.ai/research
Follow us on X: https://x.com/GoodfireAI
https://www.youtube.com/watch?v=q-yo6TPRPVk


Seonglae Cho