Weight Decay

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 May 11 6:13
Editor
Edited
Edited
2024 Mar 11 5:15

Imposing a penalty on the weight size itself

Regularized loss is equivalent to shrinking/decaying θ by a scalar factor of and then apply standard gradient and that coefficient is decaying weight to prevent
Overfitting
It is tradeoff to train both term at the same time.

when
L2 Norm

 
 

Use case

Layer Normalization
이나
Bias
parameter 는 크기 상관없으니 parameter group 때두고 한다
 
 
 
 
 

Recommendations