본문 바로가기

Artificial Intelligence/Machine Learning

VC-dimension

이번 글의 목차는 다음과 같습니다.

1. Growth function과 VC-dimension

2. $\epsilon$-net theorem and PAC learnability

1. Growth function과 VC-dimension

Definition 1 (Growth function,[1])
 Let \( \mathcal{H} \) be a hypothesis class. Then the growth function of \( \mathcal{H} \), \( \tau_{\mathcal{H}} : \mathbb{N}\to\mathbb{N} \) is defined as:
 \begin{align*}
 \tau_{\mathcal{H}} (m) := \max_{C\subset\mathcal{X}:|C|=m} | \mathcal{H}_C|.
 \end{align*}

 즉, growth function은 hypothesis class $\mathcal{H}$에 의해 labeleing 가능한 $C\subset\mathcal{X}$의 경우의 수이다. Growth function에 대한 몇가지 성질을 알 수 있는데, finite hypothesis class에 대해 $\tau_{\mathcal{H}} (m)\leq|\mathcal{H}|$ 이다.

2. $\epsilon$-net theorem and PAC learnability

Definition 2 ( $\epsilon$-net, [1]) Let $\mathcal{X}$ be a domian. $S\subset\mathcal{X}$ is an $\epsilon$-net for $\mathcal{H}\subset2^{\mathcal{X}}$ with respect to a distribution $\mathcal{D}$ over $\mathcal{X}$ if

$$\forall{h}\in\mathcal{H} : \mathcal{D}(h) \geq \epsilon \rightarrow h\cap S \neq \emptyset$$

즉, $\epsilon$-net은 좋은 것이다.

Theorem 3 ([1]). Let $ \mathcal{H} \subset 2^{|\mathcal{X}|}$ and $\text{VCdim}(\mathcal{H})=d$. There exist $m$ such that for training et $|S|\geq m$ , with probability $1-\delta$ over a choice of $ S \sim\mathcal{D}^m$, $S$ is $\epsilon$-net for $\mathcal{H}$.

Theorem 4 ([1]). Let $ \mathcal{H}$ be a hypothesis class over $\mathcal{X}$ with $\text{VCdim}(\mathcal{H})=d$. Let $\mathcal{D}$ be a distriubtion over $\mathcal{X}$ and let $c\in\mathcal{H}$ be a target hypothesis. For $m\geq m^{\text{PAC}}(\epsilon,\delta)$, with probabilty at least \( 1- \delta \) with \(m\) i.i.d. instances from $\mathcal{X}$, we have ERM hypothesis has a true error at most $\epsilon$.

Proof

$\mathcal{H}^c:=\{c \bigtriangleup h : h\in\mathcal{H} \}$의 VC-dimension이 $\mathcal{H}$와의 VC-dimension과 같음을 보이면 된다. 

$L_{\mathcal{D}}(h)=\mathcal{D}(h \bigtriangleup c)$이고 $S$가 $\mathcal{H}^c$의 $\epsilon$-net이 되므로 $(h \bigtriangleup c)\cap A =\emptyset$이면 $L_{\mathcal{D}}(h)<\epsilon$이다.

Q.E.D.

Agnostic한 경우에는 uniform convergence property를 먼저 증명한다.

Theorem $\mathcal{H}$ is agnostic PAC learnable with sample complexity

$$m_{\mathcal{H}}(\epsilon,\delta)\leq C_2 \frac{d+\log(1/\delta)}{\epsilon^2}$$

 

마지막으로 우리는 VC-dimension과 PAC learnability사이의 관계를 알 수 있다.

Theorem. Let $\mathcal{H}$ be a hypothesis class of functions from $\mathcal{X}$ to $\{0,1\}$. Let $m$ be a trainign set size and exists a set $C\subset\mathcal{X}$ of size $2m$ that is shattered by $\mathcal{H}$. Then, there exist distribution $\mathcal{D}$ over $\mathcal{X}\times\{0,1\}$ and $L_{\mathcal{D}}(h)=0$ but with probability at least $1/7$ we have $L_{\mathcal{D}}(A(S))\geq 1/8$.

 

즉 VC-dimension이 infinite한 경우 PAC-learnable 하지 않다는 것을 증명할 수 있다.

References

[1] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

Non-uniform learnability  (0) 2023.04.18
PAC learning and uniform convergence  (0) 2023.04.18
No free lunch theorem  (0) 2023.04.16
Supervised Learning Terminology  (0) 2019.10.15
Parametric method - Regression  (0) 2019.10.07