Exploding gradient

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Oct 21 11:35
Editor
Edited
Edited
2025 Nov 11 22:39

Due to the repeated multiplication of weights

When
Eigenvalue
is larger than 1
Exploding gradient
while
Vanishing Gradient
happens when eigenvalue is smaller than 1
Gradient information to be sufficiently passed through the network; Not too much (
Exploding gradient
), not too little (
Vanishing Gradient
)
Exploding gradient
and
Vanishing Gradient
typically occur due to non-linear components, though deep layers of linear transformations can also be problematic

Ordered -
Vanishing Gradient

Chaotic -
Exploding gradient

Edge of Chaos

Therefore, when performing
Weight Initialization
, setting ensures that gradients are stably propagated, latent representations are both expressive and stable, and the network reaches the critical learning regime (edge of chaos).
 
 

Trace anomaly

 
 
 
 

Recommendations