Gradient Accumulation

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 8 15:20
Editor
Edited
Edited
2024 Nov 22 20:45
Refs
Refs
Enables training with virtually simulate larger batch sizes by accumulating gradients over iterations, enhancing stability and model quality. Useful for training large-scale models in NLP or vision domains where memory constraints restrict batch sizes.
Model Generalization
측면에서 좋고 매번 업데이트 안해줘서 좋다
너무 크게 하면 local minimum에 빠질 수 있으니 조심해야
 
 
 
 
sequence length normalization matters
 
 

Recommendations