Compute gradient
gradient of current tensor w.r.t. graph leaves
gradients값들을 추후에 backward를 해줄때 계속 더해주기 때문에 항상 backpropagation을 하기전에 gradients를 zero로 만들어주고 시작
retain_graph
- Computational Graph 유지한다 memory
gradient
create_graph
inputs
With this straightforward PyTorch method, you can reduce your GPU memory consumption by up to half or essentially increase your batch size twofold. Rather than aggregating multiple losses and subsequently invoking backward(), you should invoke backward() on each loss individually. This strategy is only effective if you have numerous losses with separate computational graphs.