Posterior Predictive Distribution

Unlike Prior or Posterior, modeling the distribution of new data given existing data involves considering the entire distribution rather than just finding optimal model

\theta

parameters. This is similar to the structure used in

Model based RL. This consider model uncertainty posterior

p(\theta|\mathcal{D})

p(\tilde{y} | \mathcal{D}) = \int p(\tilde{y} | \theta) p(\theta | \mathcal{D}) \, d\theta \tag{0}

$p(\tilde{y} | \theta)$ Likelihood of the new data $\hat{y}$ given the parameter $\theta$

$p(\theta | \mathcal{D})$ is Posterior distribution of the parameter $\theta$ given the data $\mathcal{D}$

How to derive this form

Lets start with the

Joint Probability

p(\tilde{y}, \theta \mid \mathcal{D}) = p(\tilde{y} \mid \theta, \mathcal{D}) \cdot p(\theta \mid \mathcal{D}) \tag{1}

Posterior Predictive Distribution can be represented as

Marginalization

p(\tilde{y} | \mathcal{D}) = \int p(\tilde{y}, \theta | \mathcal{D}) \, d\theta \tag{2}

We can use above Joint distribution

p(\tilde{y} | \mathcal{D}) = \int p(\tilde{y} | \theta, \mathcal{D}) \cdot p(\theta | \mathcal{D}) \, d\theta \tag{3}

Assuming

Conditionally Independent of

\tilde{y}

and

\mathcal{D}

given

\theta

p(\tilde{y} | \theta, \mathcal{D}) = p(\tilde{y} | \theta) \tag{4}

And then we got the first form

(0)

Bayes model averaging (BMA)

This can be viewed as a form of Bayes model averaging (BMA) since we are making predictions using an infinite set of models with parameters values, each one weighted by how likely it is. The use of BMA reduces the chance of overfitting since we are not just using the single best model.

Posterior Predictive Distribution

Posterior Predictive Distribution

How to derive this form

Bayes model averaging (BMA)

Backlinks

Recommendations