Model Optimizer

Creator

Creator

Seonglae Cho

Created

Created

2023 Jun 18 8:47

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Mar 24 16:57

Refs

Refs

Stochastic Gradient Descent

Update weights & manage learning rate

An algorithm that adjusts the parameters of a model in order to minimize the difference between the predicted output and the actual output of the training data.

When applying a fixed learning rate, the model may oscillate or fail to converge. With constant gradients like the sign function in L1 norm, why adaptive learning rates are applied.

Model Optimizers

AdamW Optimizer

Sophia Optimizer

Model Optimizer Notion

Distributed Optimizer

Visualization

Gradient descent visualization - hills

5 gradient descent methods (gradient descent, momentum, adagrad, rmsprop & adam) racing down a terrain with two hills. Software: https://github.com/lilipads/gradient_descent_viz Blog Post: https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c

Gradient descent visualization - hills

https://www.youtube.com/watch?v=ilYd4TAzNoU

Gradient descent visualization - hills

Adafactor Optimizer for Deep Learning

메모리 사용량이 적으면서 learning rate도 알아서 찾아주는 Adafactor에 대해서 알아본다.

Adafactor Optimizer for Deep Learning

https://heegyukim.medium.com/adafactor-optimizer-for-deep-learning-8268ca91e506

Adafactor Optimizer for Deep Learning

[논문 리뷰] AdamW에 대해 알아보자! Decoupled weight decay regularization 논문 리뷰(1)

재야의 숨은 고수가 되고 싶은 초심자

[논문 리뷰] AdamW에 대해 알아보자! Decoupled weight decay regularization 논문 리뷰(1)

https://hiddenbeginner.github.io/deeplearning/paperreview/2019/12/29/paper_review_AdamW.html

[논문 리뷰] AdamW에 대해 알아보자! Decoupled weight decay regularization 논문 리뷰(1)

Backlinks

Density Estimation Pytorch Grammar Machine Learning

Recommendations

//////