torch Tensor.backward() Without this, the computation graph of the loss continues to accumulate. This is why we need to call backward() every time in Gradient Accumulation
Pytorch Autograd
Creator
Creator
Seonglae ChoCreated
Created
2024 Mar 8 13:49Editor
Editor
Seonglae ChoEdited
Edited
2025 Jul 20 19:20
