Due to the repeated multiplication of weights
Spectral radius of the weight matrix accumulate since gradient accumulate in this direction.
When Eigenvalue is larger than 1 Exploding gradient while Vanishing Gradient happens when eigenvalue is smaller than 1
Gradient information to be sufficiently passed through the network; Not too much (Exploding gradient), not too little (Vanishing Gradient )
Exploding gradient and Vanishing Gradient typically occur due to non-linear components, though deep layers of linear transformations can also be problematic
While stacking more layers increases data representation capacity and should improve learning, in practice deeper networks often train poorly. This is the vanishing gradient phenomenon where gradient values become extremely small as they propagate away from the output layer. This occurs when the gradient of the Activation Function is much smaller than the activation's actual value. The problem was initially mitigated using Tanh Function, then solved by adopting Non-saturating nonlinearity functions like ReLU, etc.
Vanishing gradients are desirable to some extent as it is reasonable to assume that information near timestep is more useful than information far. Therefore vanishing gradients are okay if the information is not relevant.
Ordered - Vanishing Gradient
Chaotic - Exploding gradient
Edge of Chaos
Therefore, when performing Weight Initialization, setting ensures that gradients are stably propagated, latent representations are both expressive and stable, and the network reaches the critical learning regime (edge of chaos).
[딥러닝] 기울기 소실(Vanishing Gradient)의 의미와 해결방법
📚 목차 1. 기울기 소실의 의미 2. 기울기 소실의 원인 3. 기울기 소실의 해결방법 1. 기울기 소실의 의미 딥러닝 분야에서 Layer를 많이 쌓을수록 데이터 표현력이 증가하기 때문에 학습이 잘 될 것 같지만, 실제로는 Layer가 많아질수록 학습이 잘 되지 않습니다. 바로 기울기 소실(Vanishing Gradient) 현상때문입니다. 기울기 소실이란 역전파(Backpropagation) 과정에서 출력층에서 멀어질수록 Gradient 값이 매우 작아지는 현상을 말합니다(그림 1 참고). 그렇다면 왜 이런 기울기 소실 문제가 발생할까요? 이어지는 섹션에서 자세히 알아봅니다. 2. 기울기 소실의 원인 기울기 소실의 발생 원인은 활성화 함수(Activation Function)의 기울기와 관련이 깊습니다..
https://heytech.tistory.com/388

Seonglae Cho