Tracking gradient norms during training helps identify Vanishing Gradient or Exploding gradient, which can destabilize training. By monitoring the gradient norms of each parameter, you ensure proper gradient flow.
After computing the loss and calling
loss.backward()
, each parameter's gradient is accessible via param.grad
. The .norm()
function computes the Euclidean norm of the gradient tensor, providing a scalar value representing the magnitude of gradients for that parameter.for epoch in range(num_epochs): for images, labels in train_loader: optimizer.zero_grad() outputs = model(images) loss = criterion(outputs, labels) loss.backward() for name, param in model.named_parameters(): if param.grad is not None: print(f"Gradient norm for {name}: {param.grad.norm()}") optimizer.step()