Enables training with virtually simulate larger batch sizes by accumulating gradients over iterations, enhancing stability and model quality. Useful for training large-scale models in NLP or vision domains where memory constraints restrict batch sizes.
Model Generalization 측면에서 좋고 매번 업데이트 안해줘서 좋다
너무 크게 하면 local minimum에 빠질 수 있으니 조심해야
sequence length normalization matters
Bug Fixes in LLM Training - Gradient Accumulation
Unsloth's Gradient Accumulation fix solves critical errors in LLM Training.
https://unsloth.ai/blog/gradient

Seonglae Cho