Batch gradient Descent

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2021 Oct 6 10:3
Editor
Edited
Edited
2024 Oct 21 11:35
Refs
Refs
It looks at every example in the entire training set on every step
Gradient Accumulation
이랑 다른 점은 전체 데이터셋에 대한 손실 함수를 적용한다는 것
 
Learning rate and mini-batch effect the smoothness of the loss landscape via an analysis of the Hessian. The largest eigenvalue of the
Hessian Matrix
described the gradient direction which is most changing and thus defined a speed limit on updates
 
 
 
 
 

Recommendations