- Likelihood of the new data given the parameter
- is Posterior distribution of the parameter given the data
How to derive this form
Lets start with the Joint Probability
Posterior Predictive Distribution can be represented as Marginalization
We can use above Joint distribution
Assuming conditional independence of and given
And then we got the first form
Bayes model averaging (BMA)
This can be viewed as a form of Bayes model averaging (BMA) since we are making predictions using an infinite set of models with parameters values, each one weighted by how likely it is. The use of BMA reduces the chance of overfitting since we are not just using the single best model.