Lottery Ticket Hypothesis

LTH

The hypothesis states that within a large neural network, there exists a small, sparse subnetwork (a "winning ticket") that, when trained in isolation with its original initialization weights (or values very close to them), can match the performance of the full network.

Randomly initialized large network ⟶ training ⟶ pruning

Certain subnetworks retrained with original initialization weights still maintain performance

Performance heavily depends on "structure + initialization"

arxiv.org

https://arxiv.org/pdf/1803.03635

MLP Interpretability

A paper explaining the internal mechanism of the

Grokking phenomenon in small neural networks learning modular addition through Fourier features + lottery ticket structure + phase alignment process. What the model actually learns: when a two-layer neural network solves modular addition, each neuron learns a single-frequency Fourier feature. In other words, it solves the problem by transforming it into a periodic signal decomposition problem rather than arithmetic. Previous research only discovered that "neurons learn frequencies," but this paper explains how those features are combined into a complete algorithm and why generalization suddenly occurs. Modular addition is special because it can be completely expressed with Fourier bases, making it possible to precisely analyze the internal mechanism, which is why it was chosen as a toy model.

After the memorization phase, phase alignment aligns the frequencies' phases, causing the entire structure to operate like a single algorithm. Then grokking occurs with an explosion in generalization performance. In other words,

Grokking is not about feature discovery but rather about alignment or composition of already-discovered features.

Similar to the

Lottery Ticket Hypothesis, there already exists a subnetwork within the network that can implement the correct algorithm. Learning is the process of "activating" that structure.

arxiv.org

https://arxiv.org/pdf/2602.16849

Lottery Ticket Hypothesis

LTH

MLP Interpretability

Backlinks

Recommendations