Deep double descent

Creator
Creator
Seonglae Cho
Created
Created
2024 Sep 14 20:18
Editor
Edited
Edited
2025 Jun 2 11:44

Double descent of Generalization performance

  1. First descent
  1. Overfitting
  1. Second descent
Deep Learning
presents a challenge to classical statistical learning theory. Neural networks often achieve zero training error, yet they generalize well to unseen data. This contradicts traditional expectations and makes many classical generalization bounds ineffective.
Sparse activation and the
Superposition Hypothesis
have been proposed as possible explanations for the
Grokking
phenomenon, where models learn to activate sparsely and generalize well after initially overfitting when trained on very large datasets.
notion image
Modern interpolating regime by Belkin et al. (2018) , ,
Modern interpolating regime by Belkin et al. (2018)
Grokking
, ,
https://openai.com/index/deep-double-descent/
 
 
 
 
 
 
 

Recommendations