The Monte Carlo estimator for E[f(X)] performs better than sampling from the original distribution when it has lower variance. For comparison, the variance of I=N1∑f(xi) is N1V(X), while the variance of N1∑gf(xi)g(xi)p(xi) is N1V(f(x)g(x)p(x)) - with lower variance being preferable.
In short, Monte Carlo methods enable us to estimate any integral by random sampling. InBayesian Statistics, Evidence is also form of integral so it becomes tractable.
Self-normalized importance sampling
Weight is probability of target p(x,y) divided by proposal distribution g
Small setback: for the particular case where our function f to integrate is the posterior p(x∣y), we can only evaluate W(x) up to a constant.
When we define unnormalized weights, w(x)=g(x)p(x,y) which can be used to approximate the Evidence.
p(y)=∫p(x,y)dx=∫g(x)g(x)p(x,y)dx=Eg[w(X)]
Use the unnormalized importance weights with the same sampled value x from g, to estimate both the numerator and the denominator.
Using the prior p(x) as the proposal distribution g(x), we can weight samples by their likelihood. This provides a softer approach compared to Rejection Sampling, which uses hard rejection/acceptance when sampling latent variables.
In Importance sampling practice, a good proposal must be close to the posterior which might be quite different from the prior. MCMC algorithms draw samples from a target distribution by performing a biased Random Walk over the space of latent variables x.
Annealed Importance Sampling (AIS)
Instead of using a fixed proposal distribution, samples are drawn through a process of gradually changing distributions, transitioning to the target distribution through intermediate distributions MCMC
Consider the data according to the probability of trajectory occurrence, allowing mathematical use of Off-policy data to utilize more data while weighing by importance