PAC Bound

Creator
Creator
Seonglae Cho
Created
Created
2025 Mar 23 23:28
Editor
Edited
Edited
2025 Jun 2 16:29

Generalization bounds are a safety check

They give a theoretical guarantee on the performance of a learning algorithm on any unseen data.
Just because a model fits well on training data doesn't guarantee it will perform well in practice. However, we can mathematically bound how well it generalizes.
The PAC-Bayes Bound provides a probabilistic guarantee for this, showing that the difference between training error Rin(Q)R_{in}(Q) and test error Rout(Q)R_{out}(Q) can be expressed as a complexity term based on the
Divergence Distance
between the
Prior
and
Posterior
. This can be simply expressed as follows:
Rout(Q)Rin(Q)+Complexity(Divergence,m,δ)R_{out}(Q) \le R_{in}(Q) + Complexity(Divergence, m, \delta)
where mm is sample count and δ\delta is the probability of being misled by the training set.
With high probability, the generalization error of an hypothesis hh is at most something we can control and even compute. There is a probabilistic guarantee that the true error can be upper bounded by adding a margin ϵϵ to the training error.
P[RoutRin(h)+ϵ(m,δ)]1δ\mathbb{P}[R_{out} \le R_{in}(h) + \epsilon(m, \delta)] \ge 1 - \delta
The generalization error of an hypothesis hh is at most something we can control and even compute for any δ>0\delta > 0.
  • Prior
    PP here means exploration mechanism of H\mathcal{H}
  • Posterior
    QQ here means the twisted prior after confronting with data
Rin(Q)HRin(h)dQ(h),Rout(Q)HRout(h)dQ(h)R_{\mathrm{in}}(Q) \equiv \int_H R_{\mathrm{in}}(h)\,dQ(h), R_{\mathrm{out}}(Q) \equiv \int_H R_{\mathrm{out}}(h)\,dQ(h)

Prototypical bound (McAllester, 1998, 1999)

Analysis of expected error over probability distribution Q instead of single hypothesis hh.
Rout(Q)Rin(Q)+KL(QP)+lnξ(m)δ2m.R_{\text{out}}(Q) \le R_{\text{in}}(Q) + \sqrt{\frac{\mathrm{KL}(Q \| P) + \ln \frac{\xi(m)}{\delta}}{2m}}.
when think of ϵ(m,δ)\epsilon(m, \delta) as Complexity+log1δm\frac{Complexity + log{1}{\delta}}{\sqrt{m}}.
  • Bound depends on the distance between prior and posterior
  • Better prior (closer to posterior) would lead to tighter bound
  • Learn the prior P with part of the data
  • Introduce the learnt prior in the bound
Dziugaite and Roy (2017), Neyshabur et al. (2017) have derived some of the tightest deep learning bounds in this way
 
 
 
 
2017
2024
 
 

Recommendations