Stochastic Gradient Descent

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2021 Oct 6 10:3
Editor
Edited
Edited
2024 Sep 30 14:32
Refs
Refs

SGD

Stochastic gradient descent randomly chooses data batches to introduce noise, which is helpful for optimizing model robustness, as opposed to using the whole dataset at once, which is done by Naive Gradient Descent. SGD also refers to Vanilla Gradient Descent without momentum.
Alternative to
Batch gradient Descent
, It updates more frequently than Batch Gradient Descent
model's parameters are updated after processing each training example
Very scalable so used in most model
update every time

Need to shuffle for each
Training Epoch

KLD를 줄이는 쪽으로 파라메터를 업데이트
KLD 식이 미분 가능해야
 
 
 
 

Recommendations