Steepest descent algorithm
Gradient based learning can be easily continued
Statistical Thinking Update your probability not change (Learning Rate)
gradient ascent maximize and descent minimize function
Iteratively subtract the derivative of Loss Function wrt weight from the weight with the learning rate and then Cost Function decreases
Repeat until the weight convergence (Local extremum point)
Gradient Descent를 시각적으로 이해하려면 Loss Function 우리가 각 데이터에 대해 loss function을 3차원에서 가지고 있고 gravity처럼 아래로 내려가는 것이다. 최대한 다양한 데이터를 활용해 모든 data point에서 loss 0에 가까워지도록 하는 것
Gradient Descents
Accelerated Gradient Methods
Gradient Descent Notion