Overfitting

Creator
Creator
Seonglae Cho
Created
Created
2021 May 31 5:45
Editor
Edited
Edited
2025 Jun 2 11:51

Predicts training dataset > test, High
Variance
(Spurious correlation)

Learning to memorize individual samples rather than generalizing from the data
Models that are bigger or have more capacity are more likely to overfit. In other words, large number of parameters can cause overfitting since it allows for more accurate fitting to the training data, resulting in higher variance of the model. However If data is general enough, overfitting is okay like
Deep double descent
.
Overfitting is related to information being stored in the weights that encodes the training set, as opposed to the data generating distribution. This corresponds to reducing the concentration of the distribution of weight vectors output by the algorithm.
notion image

Modern approach

Deep Learning
presents a challenge to classical statistical learning theory. Neural networks often achieve zero training error, yet they generalize well to unseen data. This contradicts traditional expectations and makes many classical generalization bounds ineffective.
Sparse activation and the
Superposition Hypothesis
have been proposed as possible explanations for the
Grokking
phenomenon, where models learn to activate sparsely and generalize well after initially overfitting when trained on very large datasets.
notion image
Modern interpolating regime by Belkin et al. (2018) , ,
Modern interpolating regime by Belkin et al. (2018)
Grokking
, ,
 
 
 
 
 
 

Recommendations