SGD
Stochastic gradient descent randomly chooses data batches to introduce noise, which is helpful for optimizing model robustness, as opposed to using the whole dataset at once, which is done by Naive Gradient Descent. SGD also refers to Vanilla Gradient Descent without momentum.
Alternative to Batch gradient Descent, It updates more frequently than Batch Gradient Descent
model's parameters are updated after processing each training example
Very scalable so used in most model
update every time
Need to shuffle for each Training Epoch
Stochastic Variational Inference
KLD를 줄이는 쪽으로 파라메터를 업데이트
KLD 식이 미분 가능해야