이번 글의 목차는 다음과 같습니다.
1. Growth function과 VC-dimension
2. $\epsilon$-net theorem and PAC learnability
1. Growth function과 VC-dimension
Definition 1 (Growth function,[1])
Let \( \mathcal{H} \) be a hypothesis class. Then the growth function of \( \mathcal{H} \), \( \tau_{\mathcal{H}} : \mathbb{N}\to\mathbb{N} \) is defined as:
\begin{align*}
\tau_{\mathcal{H}} (m) := \max_{C\subset\mathcal{X}:|C|=m} | \mathcal{H}_C|.
\end{align*}
즉, growth function은 hypothesis class $\mathcal{H}$에 의해 labeleing 가능한 $C\subset\mathcal{X}$의 경우의 수이다. Growth function에 대한 몇가지 성질을 알 수 있는데, finite hypothesis class에 대해 $\tau_{\mathcal{H}} (m)\leq|\mathcal{H}|$ 이다.
2. $\epsilon$-net theorem and PAC learnability
Definition 2 ( $\epsilon$-net, [1]) Let $\mathcal{X}$ be a domian. $S\subset\mathcal{X}$ is an $\epsilon$-net for $\mathcal{H}\subset2^{\mathcal{X}}$ with respect to a distribution $\mathcal{D}$ over $\mathcal{X}$ if
$$\forall{h}\in\mathcal{H} : \mathcal{D}(h) \geq \epsilon \rightarrow h\cap S \neq \emptyset$$
즉, $\epsilon$-net은 좋은 것이다.
Theorem 3 ([1]). Let $ \mathcal{H} \subset 2^{|\mathcal{X}|}$ and $\text{VCdim}(\mathcal{H})=d$. There exist $m$ such that for training et $|S|\geq m$ , with probability $1-\delta$ over a choice of $ S \sim\mathcal{D}^m$, $S$ is $\epsilon$-net for $\mathcal{H}$.
Theorem 4 ([1]). Let $ \mathcal{H}$ be a hypothesis class over $\mathcal{X}$ with $\text{VCdim}(\mathcal{H})=d$. Let $\mathcal{D}$ be a distriubtion over $\mathcal{X}$ and let $c\in\mathcal{H}$ be a target hypothesis. For $m\geq m^{\text{PAC}}(\epsilon,\delta)$, with probabilty at least \( 1- \delta \) with \(m\) i.i.d. instances from $\mathcal{X}$, we have ERM hypothesis has a true error at most $\epsilon$.
Proof
$\mathcal{H}^c:=\{c \bigtriangleup h : h\in\mathcal{H} \}$의 VC-dimension이 $\mathcal{H}$와의 VC-dimension과 같음을 보이면 된다.
$L_{\mathcal{D}}(h)=\mathcal{D}(h \bigtriangleup c)$이고 $S$가 $\mathcal{H}^c$의 $\epsilon$-net이 되므로 $(h \bigtriangleup c)\cap A =\emptyset$이면 $L_{\mathcal{D}}(h)<\epsilon$이다.
Q.E.D.
Agnostic한 경우에는 uniform convergence property를 먼저 증명한다.
Theorem $\mathcal{H}$ is agnostic PAC learnable with sample complexity
$$m_{\mathcal{H}}(\epsilon,\delta)\leq C_2 \frac{d+\log(1/\delta)}{\epsilon^2}$$
마지막으로 우리는 VC-dimension과 PAC learnability사이의 관계를 알 수 있다.
Theorem. Let $\mathcal{H}$ be a hypothesis class of functions from $\mathcal{X}$ to $\{0,1\}$. Let $m$ be a trainign set size and exists a set $C\subset\mathcal{X}$ of size $2m$ that is shattered by $\mathcal{H}$. Then, there exist distribution $\mathcal{D}$ over $\mathcal{X}\times\{0,1\}$ and $L_{\mathcal{D}}(h)=0$ but with probability at least $1/7$ we have $L_{\mathcal{D}}(A(S))\geq 1/8$.
즉 VC-dimension이 infinite한 경우 PAC-learnable 하지 않다는 것을 증명할 수 있다.
References
[1] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
'Artificial Intelligence > Machine Learning' 카테고리의 다른 글
Non-uniform learnability (0) | 2023.04.18 |
---|---|
PAC learning and uniform convergence (0) | 2023.04.18 |
No free lunch theorem (0) | 2023.04.16 |
Supervised Learning Terminology (0) | 2019.10.15 |
Parametric method - Regression (0) | 2019.10.07 |