PAC

Creator
Creator
Seonglae Cho
Created
Created
2023 Nov 20 6:52
Editor
Edited
Edited
2025 Jun 2 16:30

Probabilistic Approximately Correct framework

In statistical learning theory, what's important is not the "average error" but rather the guarantee that "the error is small with high confidence". Therefore, we approach this by calculating the probability tail where the error exceeds a certain value. We introduce a
Confidence Parameter
similar to the
p-value
in
Significance Testing
. This ensures that the probability of the learning result being sufficiently accurate (approximately correct) is guaranteed to be at least 1−δ
  • Empirical risk Rin(h)=1miml(h(Xi),Yi)R_{in} (h) = \frac{1}{m} \sum_i^{m} l(h(X_i), Y_i) where h(X)h(X) is the predicted output
  • Theoretical risk Rout(h)=E[l(h(X),Y)]R_{out} (h) = \mathbb{E} [l(h(X), Y)]
  • Generalization gap Δ(h)=Rout(h)Rin(h)\Delta(h) = R_{out} (h) - R_{in} (h)
  • Upper bounds Δ(h)ϵ(m,δ)\Delta(h) \le \epsilon(m, \delta), Rout(h)Rin(h)+ϵ(m,δ)R_{out}(h) \le R_{in} (h) + \epsilon(m, \delta) with high probability
With high probability, the generalization error of an hypothesis hh is at most something we can control and even compute. There is a probabilistic guarantee that the true error can be upper bounded by adding a margin ϵϵ to the training error.
P[RoutRin(h)+ϵ(m,δ)]1δ\mathbb{P}[R_{out} \le R_{in}(h) + \epsilon(m, \delta)] \ge 1 - \delta
The generalization error of an hypothesis hh is at most something we can control and even compute for any δ>0\delta > 0.
  • Prior
    PP here means exploration mechanism of H\mathcal{H}
  • Posterior
    QQ here means the twisted prior after confronting with data
Rin(Q)HRin(h)dQ(h),Rout(Q)HRout(h)dQ(h)R_{\mathrm{in}}(Q) \equiv \int_H R_{\mathrm{in}}(h)\,dQ(h), R_{\mathrm{out}}(Q) \equiv \int_H R_{\mathrm{out}}(h)\,dQ(h)

Prototypical bound (McAllester, 1998, 1999)

Analysis of expected error over probability distribution Q instead of single hypothesis hh.
Rout(Q)Rin(Q)+KL(QP)+lnξ(m)δ2m.R_{\text{out}}(Q) \le R_{\text{in}}(Q) + \sqrt{\frac{\mathrm{KL}(Q \| P) + \ln \frac{\xi(m)}{\delta}}{2m}}.
when think of ϵ(m,δ)\epsilon(m, \delta) as Complexity+log1δm\frac{Complexity + log{1}{\delta}}{\sqrt{m}}.
Probabilistic Approximately Correct Notion
 
 
PAC Bayes results
 
 
 
 
 
 

Backlinks

Bayesian

Recommendations