Update weights & manage learning rate
An algorithm that adjusts the parameters of a model in order to minimize the difference between the predicted output and the actual output of the training data.
When applying a fixed learning rate, the model may oscillate or fail to converge. With constant gradients like the sign function in L1 norm, why adaptive learning rates are applied.
All Neural Network and Model Optimizer can be viewed as Associative Memory that compresses context flow.
Model Optimizers
Model Optimizer Notion
Visualization
Gradient descent visualization - hills
5 gradient descent methods (gradient descent, momentum, adagrad, rmsprop & adam) racing down a terrain with two hills.
Software: https://github.com/lilipads/gradient_descent_viz
Blog Post: https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c
https://www.youtube.com/watch?v=ilYd4TAzNoU

Adafactor Optimizer for Deep Learning
메모리 사용량이 적으면서 learning rate도 알아서 찾아주는 Adafactor에 대해서 알아본다.
https://heegyukim.medium.com/adafactor-optimizer-for-deep-learning-8268ca91e506

[논문 리뷰] AdamW에 대해 알아보자! Decoupled weight decay regularization 논문 리뷰(1)
재야의 숨은 고수가 되고 싶은 초심자
https://hiddenbeginner.github.io/deeplearning/paperreview/2019/12/29/paper_review_AdamW.html
![[논문 리뷰] AdamW에 대해 알아보자! Decoupled weight decay regularization 논문 리뷰(1)](https://www.notion.so/image/https%3A%2F%2Fhiddenbeginner.github.io%2Fstatic%2Fimg%2FHDBG.png?table=block&id=f5f2830c-e206-4c69-bd32-6ae2f9d0083f&cache=v2)

Seonglae Cho