먼저 Agnostic PAC learnability를 소개하겠다.
Definition 1. (Agnostic PAC learnability, [1])
A hypothesis class \( \mathcal{H} \) is agnostic PAC learnable if a learning algorithm has following sample complextiy \(n_{\mathcal{H}}:[0,1]^2\rightarrow \mathbb{N}\) : For arbitrary \( \delta,\epsilon \in (0,1) \) and
for every distribution \(\mathcal{D}\) over \(\mathcal{X}\times\mathcal{Y}\), \( n \geq n_{\mathcal{H}} (\epsilon,\delta) \) i.i.d. samples generated by \( \mathcal{D}\), the algorithm returns \(h\) such that with probability of at least \(1-\delta \),
\begin{align}
L_{\mathcal{D}}(h) \leq \min_{h^{\prime}\in\mathcal{H}} L_{\mathcal{D}}(h^{\prime}) + \epsilon \label{eq:ag_pac_condition}
\end{align}
Uniforom convergence는 learning theory에서 매우 중요하다.
Agnostic PAC learnability를 이해하기 위해 uniform convergence를 이해해야 한다. 또한 uniform convergence를 이해하려면 $\epsilon$-representative sample을 이해해야한다.
Definition 2 [\(\epsilon\)-representative sample, [1]]
A training set \(S\) is called \(\epsilon\)-representative if
\begin{equation}
\forall{h}\in\mathcal{H}, \; |L_S(h) - L_{\mathcal{D}}(h)| \leq \epsilon \nonumber .
\end{equation}
\(\epsilon\)-representative sample은 hypothesis class에 depend한다는 것을 주의해야한다.
Lemma 3
Assume that a training set $S$ is $\frac{\epsilon}{2}$-representative. Then, any output of ERM rule satisfies
$$L_{\mathcal{D}} (h_S) \leq \min_{h\in\mathcal{H}} L_{\mathcal{D}}(h)+\epsilon.$$
Proof.
$L_{\mathcal{D}}(h_S) \leq L_{\mathcal{D}}(h)+\epsilon$임을 보이면 된다.
모든 $h\in\mathcal{H}$,
$$ L_{\mathcal{D}}(h_S) \leq L_S(h_S)+\frac{\epsilon}{2} \leq L_S(h)+\frac{\epsilon}{2} \leq L_{\mathcal{D}}(h)+\frac{\epsilon}{2} $$
첫번째 부등식은 $S$가 $epsilon/2$-representative 하다는 definition에서 나오고 두번째 부등식은 $h_S$가 ERM rule의 output이라는 것, 그리고 마지막 부등식은 $S$가 $\epsilon/2$-representative 하다는 것을 다시 이용하면 된다.
즉, Lemma 4는 $S$가 $\frac{\epsilon}{2}$-representative하다면 ERM rule은 agnostic PAC learner라는 것을 보여준다. 따라서 우리라 충분히 사이즈의 training set이 $\frac{\epsilon}{2}$-representative하다는 것을 가정하면, ERM rule이 agnostic PAC learner라는 것을 가정하는 것과 같다. 이를 uniform convergence property라고 한다.
Definition 4 [Uniform Convergence, [1]]
A hypothesis class \( \mathcal{H} \) has the uniform convergence property if there exist \(n^{\text{UC}}_{\mathcal{H}}(\epsilon,\delta) \) such that for \( n \geq n^{\text{UC}}_{\mathcal{H}}(\epsilon,\delta) \) i.i.d. sampels, with probability \(1-\delta \), \(S\) is \( \epsilon \)-representative.
즉, 이제 우리는 agnostic PAC learning 대신, uniform convergence property가 성립한다는 것을 증명하면 된다.
Finite hypothesis class에 대해서는 PAC learning 처럼 agnostic PAC learnable하다는 것을 증명할 수 있다.
References
[1] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
'Artificial Intelligence > Machine Learning' 카테고리의 다른 글
Online learning algorithms (0) | 2023.06.12 |
---|---|
Non-uniform learnability (0) | 2023.04.18 |
VC-dimension (0) | 2023.04.17 |
No free lunch theorem (0) | 2023.04.16 |
Supervised Learning Terminology (0) | 2019.10.15 |