Underfitting

Creator
Creator
Seonglae Cho
Created
Created
2023 Mar 14 2:20
Editor
Edited
Edited
2025 Jun 13 18:11

The training error is relatively large, High
Bias

Overly simple models underfit, and overly complex models overfit.
 
 
 
 
 

Local KL Volume

This methodology defines a set of KL-neighbors (behaviorally similar region) around the trained model weights and efficiently estimates the probability (=Local KL Volume) that this region occupies under the initialization distribution using
Monte Carlo Method
+
Importance sampling
. Local KL Volume measures the "size" of the parameter region where the output distribution remains nearly unchanged (KL divergence ≤ ε) from the perspective of the initialization distribution.
C(θ)=Ex[DKL(f(x;θ0)f(x;θ))]εC(\theta) = \mathbb{E}_{x}\Bigl[D_{\mathrm{KL}}\bigl(f(x;\theta_{0}) \,\|\, f(x;\theta)\bigr)\Bigr] \le \varepsilon
KL volume is data-dependent, and the ratio of KL local volume between test and train datasets can be used to assess
Overfitting
. If the ratio of valid to train is less than 1, it indicates overfitting; if it is close to 1, it's optimal; and if greater than 1, it suggests
Underfitting
. Using second-moment information from optimizers like
Adam Optimizer
reduces directional variance, significantly decreasing the variance in volume estimation. The negative log of local volume can be interpreted as network information content (from an
MDL
perspective) and linked to generalization performance. As training progresses towards overfitting, local volume decreases (complexity increases).
 
 

Recommendations