Posterior Predictive Distribution

Creator
Creator
Seonglae Cho
Created
Created
2024 Nov 26 12:9
Editor
Edited
Edited
2025 Apr 29 2:50
Refs
Refs
Posterior

Posterior Predictive Distribution

Unlike Prior or Posterior, modeling the distribution of new data given existing data involves considering the entire distribution rather than just finding optimal model θ\theta parameters. This is similar to the structure used in
Model based RL
. This consider model uncertainty posterior p(θD)p(\theta|\mathcal{D})
p(\tilde{y} | \mathcal{D}) = \int p(\tilde{y} | \theta) p(\theta | \mathcal{D}) \, d\theta \tag{0}
  • p(y~θ)p(\tilde{y} | \theta) Likelihood of the new data y^\hat{y} given the parameter θ\theta
  • p(θD)p(\theta | \mathcal{D}) is Posterior distribution of the parameter θ\theta given the data D\mathcal{D}

How to derive this form

Lets start with the
Joint Probability
p(\tilde{y}, \theta \mid \mathcal{D}) = p(\tilde{y} \mid \theta, \mathcal{D}) \cdot p(\theta \mid \mathcal{D}) \tag{1}
Posterior Predictive Distribution can be represented as
Marginalization
p(\tilde{y} | \mathcal{D}) = \int p(\tilde{y}, \theta | \mathcal{D}) \, d\theta \tag{2}
We can use above Joint distribution
p(\tilde{y} | \mathcal{D}) = \int p(\tilde{y} | \theta, \mathcal{D}) \cdot p(\theta | \mathcal{D}) \, d\theta \tag{3}
Assuming
Conditionally Independent
of y~\tilde{y} and D\mathcal{D} given θ\theta
p(\tilde{y} | \theta, \mathcal{D}) = p(\tilde{y} | \theta) \tag{4}
And then we got the first form (0)(0)

Bayes model averaging (BMA)

This can be viewed as a form of Bayes model averaging (BMA) since we are making predictions using an infinite set of models with parameters values, each one weighted by how likely it is. The use of BMA reduces the chance of overfitting since we are not just using the single best model.
 
 
 
 
 

Recommendations