Exploding gradient

Creator
Creator
Seonglae Cho
Created
Created
2024 Oct 21 11:35
Editor
Edited
Edited
2025 May 30 17:38

Due to the repeated multiplication of weights

When
Eigenvalue
is larger than 1
Exploding gradient
while
Vanishing Gradient
happens when eigenvalue is smaller than 1
Gradient information to be sufficiently passed through the network; Not too much (
Exploding gradient
), not too little (
Vanishing Gradient
)
Exploding gradient
and
Vanishing Gradient
typically occur due to non-linear components, though deep layers of linear transformations can also be problematic
 
 
 

Trace anomaly

with torch.autograd.set_detect_anomaly(True): loss.backward()
 
 
 
 

Recommendations