Natural Gradient

Not like

where is

Natural gradient doesn't simply follow the steepest gradient, but rather finds the optimal direction by considering the probabilistic structure of the model since the

Fisher Information Matrix reflects the curvature of probability distribution

Computing the inverse of the Fisher Information Matrix is computationally challenging due to its large size. Therefore, various approximation methods are used to estimate the inverse matrix.

Natural Gradient Approximations

K-FAC

EK-FAC

Natural Gradient

Recommendations