new improvement than Post-Norm Stable since the derivation term is simpler than Post-Norm without additional multiplicationAnd this Constant gradients enable larger learning rates