Deep double descent

Creator

Creator

Seonglae Cho

Created

Created

2024 Sep 14 20:18

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jun 2 11:44

Refs

Refs

Bias-Variance Trade-off

In-context learning ability

Neural Network Phase Change

Double descent of Generalization performance

First descent

Overfitting

Second descent

Deep Learning presents a challenge to classical statistical learning theory. Neural networks often achieve zero training error, yet they generalize well to unseen data. This contradicts traditional expectations and makes many classical generalization bounds ineffective.

Sparse activation and the

Superposition Hypothesis have been proposed as possible explanations for the

Grokking phenomenon, where models learn to activate sparsely and generalize well after initially overfitting when trained on very large datasets.

notion image

Modern interpolating regime by Belkin et al. (2018) , , — Modern interpolating regime by Belkin et al. (2018)
Grokking, ,

https://openai.com/index/deep-double-descent/

Deep double descent

We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens, and view further study of this phenomenon as an important research direction.

Deep double descent

https://openai.com/index/deep-double-descent/

Deep double descent

Deep Double Descent: Where Bigger Models and More Data Hurt

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show...

https://arxiv.org/abs/1912.02292

Backlinks

Transformer Model Overfitting Preetum Nakkiran AI Memory Capacity In-context learning ability Reversing Transformer Language Model

Recommendations

///////