torch Tensor.backward() 안하면 loss의 computation graph가 계속 누적된다
Gradient Accumulation 에서 매번 backward를 해줘야 하는 이유
Custom autograd
import torch from torch.autograd import Function class CustomReLU(Function): @staticmethod def forward(ctx, input): ctx.save_for_backward(input) return input.clamp(min=0) @staticmethod def backward(ctx, grad_output): input, = ctx.saved_tensors grad_input = grad_output.clone() grad_input[input < 0] = 0.1 # Modified gradient for negative values return grad_input