PAC Bound

Generalization bounds are a safety check

They give a theoretical guarantee on the performance of a learning algorithm on any unseen data.

Just because a model fits well on training data doesn't guarantee it will perform well in practice. However, we can mathematically bound how well it generalizes.

The PAC-Bayes Bound provides a probabilistic guarantee for this, showing that the difference between training error

R_{in}(Q)

and test error

R_{out}(Q)

can be expressed as a complexity term based on the

Divergence Distance between the

Prior and

Posterior. This can be simply expressed as follows:

R_{out}(Q) \le R_{in}(Q) + Complexity(Divergence, m, \delta)

where

m

is sample count and

\delta

is the probability of being misled by the training set.

With high probability, the generalization error of an hypothesis

h

is at most something we can control and even compute. There is a probabilistic guarantee that the true error can be upper bounded by adding a margin

ϵ

to the training error.

\mathbb{P}[R_{out} \le R_{in}(h) + \epsilon(m, \delta)] \ge 1 - \delta

The generalization error of an hypothesis

h

is at most something we can control and even compute for any

\delta > 0

Prior $P$ here means exploration mechanism of $\mathcal{H}$

Posterior $Q$ here means the twisted prior after confronting with data

R_{\mathrm{in}}(Q) \equiv \int_H R_{\mathrm{in}}(h)\,dQ(h), R_{\mathrm{out}}(Q) \equiv \int_H R_{\mathrm{out}}(h)\,dQ(h)

Prototypical bound (McAllester, 1998, 1999)

Analysis of expected error over probability distribution Q instead of single hypothesis

h

R_{\text{out}}(Q) \le R_{\text{in}}(Q) + \sqrt{\frac{\mathrm{KL}(Q \| P) + \ln \frac{\xi(m)}{\delta}}{2m}}.

when think of

\epsilon(m, \delta)

\frac{Complexity + log{1}{\delta}}{\sqrt{m}}

Bound depends on the distance between prior and posterior

Better prior (closer to posterior) would lead to tighter bound

Learn the prior P with part of the data

Introduce the learnt prior in the bound

Dziugaite and Roy (2017), Neyshabur et al. (2017) have derived some of the tightest deep learning bounds in this way

2017

Exploring Generalization in Deep Learning

With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how...

https://arxiv.org/abs/1706.08947

Computing Nonvacuous Generalization Bounds for Deep (Stochastic)...

One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable...

https://arxiv.org/abs/1703.11008

2024

arxiv.org

https://arxiv.org/pdf/2309.04381

openreview.net

https://openreview.net/pdf?id=GWSIo2MzuH

PAC Bound

Generalization bounds are a safety check

Prototypical bound (McAllester, 1998, 1999)

Backlinks

Recommendations