Weight Decay

Creator

Seonglae Cho

Created

2023 May 11 6:13

Editor

Seonglae Cho

Edited

2024 Mar 11 5:15

Refs

Gradient Normalization

Imposing a penalty on the weight size itself

Regularized loss is equivalent to shrinking/decaying θ by a scalar factor of and then apply standard gradient and that coefficient is decaying weight to prevent

Overfitting

It is tradeoff to train both term at the same time.

when
L2 Norm

Use case

Layer Normalization 이나

Bias parameter 는 크기 상관없으니 parameter group 때두고 한다

Backlinks

Periodic Activation Function AdamW Optimizer

Recommendations

////////