PAC learning and uniform convergence

먼저 Agnostic PAC learnability를 소개하겠다.

Definition 1. (Agnostic PAC learnability, [1])
A hypothesis class $ \mathcal{H} $ is agnostic PAC learnable if a learning algorithm has following sample complextiy $n_{\mathcal{H}}:[0,1]^2\rightarrow \mathbb{N}$ : For arbitrary $ \delta,\epsilon \in (0,1) $ and
for every distribution $\mathcal{D}$ over $\mathcal{X}\times\mathcal{Y}$, $ n \geq n_{\mathcal{H}} (\epsilon,\delta) $ i.i.d. samples generated by $ \mathcal{D}$, the algorithm returns $h$ such that with probability of at least $1-\delta $,
\begin{align}
L_{\mathcal{D}}(h) \leq \min_{h^{\prime}\in\mathcal{H}} L_{\mathcal{D}}(h^{\prime}) + \epsilon \label{eq:ag_pac_condition}
\end{align}

Uniforom convergence는 learning theory에서 매우 중요하다.

Agnostic PAC learnability를 이해하기 위해 uniform convergence를 이해해야 한다. 또한 uniform convergence를 이해하려면 $\epsilon$-representative sample을 이해해야한다.

Definition 2 [$\epsilon$-representative sample, [1]]
A training set $S$ is called $\epsilon$-representative if
\begin{equation}
\forall{h}\in\mathcal{H}, \; |L_S(h) - L_{\mathcal{D}}(h)| \leq \epsilon \nonumber .
\end{equation}

$\epsilon$-representative sample은 hypothesis class에 depend한다는 것을 주의해야한다.

Lemma 3

Assume that a training set $S$ is $\frac{\epsilon}{2}$-representative. Then, any output of ERM rule satisfies

$$L_{\mathcal{D}} (h_S) \leq \min_{h\in\mathcal{H}} L_{\mathcal{D}}(h)+\epsilon.$$

Proof.

$L_{\mathcal{D}}(h_S) \leq L_{\mathcal{D}}(h)+\epsilon$임을 보이면 된다.

모든 $h\in\mathcal{H}$,

$$ L_{\mathcal{D}}(h_S) \leq L_S(h_S)+\frac{\epsilon}{2} \leq L_S(h)+\frac{\epsilon}{2} \leq L_{\mathcal{D}}(h)+\frac{\epsilon}{2} $$

첫번째 부등식은 $S$가 $epsilon/2$-representative 하다는 definition에서 나오고 두번째 부등식은 $h_S$가 ERM rule의 output이라는 것, 그리고 마지막 부등식은 $S$가 $\epsilon/2$-representative 하다는 것을 다시 이용하면 된다.

즉, Lemma 4는 $S$가 $\frac{\epsilon}{2}$-representative하다면 ERM rule은 agnostic PAC learner라는 것을 보여준다. 따라서 우리라 충분히 사이즈의 training set이 $\frac{\epsilon}{2}$-representative하다는 것을 가정하면, ERM rule이 agnostic PAC learner라는 것을 가정하는 것과 같다. 이를 uniform convergence property라고 한다.

Definition 4 [Uniform Convergence, [1]]
A hypothesis class $ \mathcal{H} $ has the uniform convergence property if there exist $n^{\text{UC}}_{\mathcal{H}}(\epsilon,\delta) $ such that for $ n \geq n^{\text{UC}}_{\mathcal{H}}(\epsilon,\delta) $ i.i.d. sampels, with probability $1-\delta $, $S$ is $ \epsilon $-representative.

즉, 이제 우리는 agnostic PAC learning 대신, uniform convergence property가 성립한다는 것을 증명하면 된다.

Finite hypothesis class에 대해서는 PAC learning 처럼 agnostic PAC learnable하다는 것을 증명할 수 있다.

References

[1] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

저작자표시

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

Online learning algorithms (0)	2023.06.12
Non-uniform learnability (0)	2023.04.18
VC-dimension (0)	2023.04.17
No free lunch theorem (0)	2023.04.16
Supervised Learning Terminology (0)	2019.10.15

Stareering

PAC learning and uniform convergence

References

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

티스토리툴바

PAC learning and uniform convergence

References

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

'Artificial Intelligence/Machine Learning' Related Articles

티스토리툴바