Variational Stochastic Gradient Descent
Using Stochastic Variational Inference (SVI), the method treats the true gradient (gt) and noisy gradient (ĝt) as latent variables, modeling them with normal and gamma distributions in a Bayesian framework. SVI estimates the posterior mean of gradients for updates at each iteration while learning noise precision. Experiments on CIFAR100, TinyImageNet, and ImageNet demonstrated superior generalization performance and convergence speed compared to Adam and SGD. It can be viewed as a generalization of Adam, SGDM, and AMSGrad, achieving better model performance at convergence.