Activation Checkpointing

Creator
Creator
Seonglae Cho
Created
Created
2024 May 6 10:49
Editor
Edited
Edited
2024 May 6 11:46
Refs
Refs

Gradient checkpointing

A technique to reduce memory usage by erasing the activation of a specific layer and recalculating it during the reverse pass. An algorithm that saves memory by storing only some of the activations of the neural network, and the discarded activations are restored through recalculation during backward propagation.
When a module is designated as a checkpoint, the module's inputs and outputs remain in memory at the end of the forward pass. All intermediate tensors that made up part of the calculation within the module are emptied during the forward pass. These tensors are recalculated during the backward pass through the checkpoint module. At this point, the layers behind this checkpoint module have completed the backward pass, so you can reduce the maximum memory usage of the checkpoint.
 
 
 
 
 

Recommendations