Posterior Predictive Distribution
Unlike Prior or Posterior, modeling the distribution of new data given existing data involves considering the entire distribution rather than just finding optimal model parameters. This is similar to the structure used in Model based RL. This consider model uncertainty posterior
- Likelihood of the new data given the parameter
- is Posterior distribution of the parameter given the data
How to derive this form
Lets start with the Joint Probability
Posterior Predictive Distribution can be represented as Marginalization
We can use above Joint distribution
And then we got the first form
Bayes model averaging (BMA)
This can be viewed as a form of Bayes model averaging (BMA) since we are making predictions using an infinite set of models with parameters values, each one weighted by how likely it is. The use of BMA reduces the chance of overfitting since we are not just using the single best model.