Online learning algorithms

1. Follow the leader algorithm
2. Follow the regularized leader algorithm

먼저 Regret을 어떻게 정의했는지 다시 생각해보자:

$$\text{Regret}_A(\mathbf{w}^*,T)=\sum^T_{t=1}f_t(\mathbf{w}^{(t)})-\sum^T_{t=1}f_t(\mathbf{w}^*)$$

1. Follow the leader algorithm

Online convex optimization은 online learning에서 convex optimization 알고리즘을 이용하는 것이다. 그 중 대표적인 것이 Follow the leader algoirthm 이다 :

$$w^{(t)}=arg\min_{w \in \mathcal{H}} \sum^{t-1}_{i=1} l(w,z_i) $$

Follow the leader rule의 regret을 구하기 위한 보조정리를 먼저 소개한다:

Lemma For any $T>0$ , for all $\mathbf{w}^*\in W$,

$$ \sum^T_{t=1} f_t(\mathbf{w}^{(t+1)})\leq \sum^T_{t=1} f_t(\mathbf{w}^*)$$

Proof

Induction 으로 쉽게 증명을 할 수 있다. T=1일 경우, follw the leader rule의 정의에 의해 다음이 성립한다:

$$f_1 ( \mathbf{w}^{(2)}) = arg\min_{w\in W} f_1(\mathbf{w}) \leq f_1 ( \mathbf{w}^*).$$

T-1일 때 모든 $\mathbf{w}^*\in W$ 다음이 성립한다고 하자:

$$\sum^{T-1}_{t=1} f_t ( \mathbf{w}^{(t+1)}) \leq \sum^{T-1}_{t=1} f_t ( \mathbf{w}^*).$$

그러면, induction hypothesis에 의해 다음이 성립한다:

$$\sum^T_{t=1} f_t ( \mathbf{w}^{(t+1)})=f_T(\mathbf{w}^{(T+1)})+\sum^{T-1}_{t=1}f(\mathbf{w}^{(t+1)})\leq f_T( \mathbf{w}^{(T+1)}) + \sum^{T-1}_{t=1} f(\mathbf{w}^{(T+1)})=\min_{\mathbf{w} \in W} \sum^T_{t=1} f_t(\mathbf{w})$$

Follw the Leader의 대표적인 예시로 online convex opimization 문제가 있다.

이 때 loss function은 다음과 같이 정의된다: $$f_t(\mathbf{w}) = \frac{1}{2} \left\| \mathbf{w}-\mathbf{z}_t \right\|_2^2$$

하지만 이런 Follow the Leader이 실패하는 경우가 있어 Follw the Regularized Leader라는 알고리즘을 정의한다:

2. Follow the regularized leader algorithm

$$\mathbf{w}^{(t)}=arg\min_{\mathbf{w}\in W} \sum^{t-1}_{i=1} f_t(\mathbf{w})+ R(\mathbf{w}) $$

Lemma Let $\mathbf{w}_1,\mathbf{w}_2,\dots $ be the sequence of vectors produced by FoReL. Then, for all $\mathbf{u}\in S$, we get

$$\sum^T_{t=1}(f_t(\mathbf{w}_t)-f_t(\mathbf{u})) \leq R(\mathbf{u})-R(\mathbf{w}_1)+\sum^T_{t=1}(f_t(\mathbf{w}_t)-f_t(\mathbf{w}_{t+1}))$$

Follow the Leader에서 증명을 하듯, $f_0(\mathbf{w})=R(\mathbf{w})$라고 한다면, $T=1$일 때,

$$ f_1(\mathbf{w}_2) =arg\min_{\mathbf{w}\in S} f_1(\mathbf{w})+ R(\mathbf{w})$$

또한 $ f_0(\mathbf{w})=R(\mathbf{w})$라고 한다면,

$$\sum^T_{t=0} (f_t(\mathbf{w}_t)-f_t(\mathbf{u})) \leq \sum^T_{t=0} (f_t(\mathbf{w}_t)-f_t(\mathbf{w}_{t+1}))$$

$T=1$일 때,

$$ f_1(\mathbf{w}_1)-f_1(\mathbf{u}) + R(\mathbf{w}_0)-R(\mathbf{u}) \leq f_1(\mathbf{w}_1)-f_1(\mathbf{w}_2)+R(\mathbf{w}_0)-R(\mathbf{w}_2)$$

나머지 부분들은 induction을 활용해 증명할 수 있다.

Theorem (FoReL on linear function, [1])

$f_t(\mathbf{w})=\mathbf{w}^{\top}\mathbf{z}_t$와 $R(\mathbf{w})=\frac{1}{2\eta}\left\|\mathbf{w}\right\|^2_2$에 대해FoReL 알고리즘을 적용할 경우, regret bound는 다음과 같이 된다:

$$\text{Regret}_T(\mathbf{u})\leq \frac{1}{2\eta}\left\| \mathbf{u} \right\|_2^2+\eta \sum^T_{t=1} \left\|\mathbf{z}_t\right\|_2^2$$

Letting $\eta=\frac{B}{L\sqrt{2T}}$ we get

$$\text{Regret}_T(U)\leq BL\sqrt{2T}.$$

비슷하게 Projected onlines subgradient descent 알고리즘도 다음과 같은 regret bound를 가진다:

$$\text{Regret}_A(\mathbf{w}^*,T) \leq \frac{\left\| \mathbf{w}^* \right\|^2}{2\eta}+\frac{\eta}{2} \sum^T_{t=1} \left\| \mathbf{v}_t \right\|_2^2$$

저작자표시 (새창열림)

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

Online learning (0)	2023.06.16
Non-uniform learnability (0)	2023.04.18
PAC learning and uniform convergence (0)	2023.04.18
VC-dimension (0)	2023.04.17
No free lunch theorem (0)	2023.04.16

Stareering

Online learning algorithms

1. Follow the leader algorithm

2. Follow the regularized leader algorithm

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

티스토리툴바

Online learning algorithms

1. Follow the leader algorithm

2. Follow the regularized leader algorithm

'Artificial Intelligence > Machine Learning' 카테고리의 다른 글

'Artificial Intelligence/Machine Learning' Related Articles

티스토리툴바