Sufficient Statistics : Factorization Theorem

casella 6절 내용 중 일부를 정리한 내용입니다.

Sufficiency Principle

parameter $\theta$에 대한 충분통계량이란, Sample들이 주어졌을 때, $\theta$에 대한 모든 정보를 담고 있는 통계량을 의미한다. 따라서 $T(\mathbf{X})$가 $\theta$에 대한 충분통계량이면, $\theta$는 오직 $T(\mathbf{X})$에만 의존한다.

If $T(X)$ is sufficient statistic for $\theta$ , then any inference about $theta$ should depend on sample $X$ only through the value $T(X)$ . That is if $x\;and\;y$ are two sample points such that $T(X)=T(y)$ , then the inference about $\theta$ would be the same whether $X=x\;orX=y$.

따라서 data-reduction이란 sample space $\mathcal{X}$에 대해나 partition이라고 생각할 수 있다:

$$ A_t = \{ \mathbf{x} \mid t = T(\mathbf{x}) \}.$$

Definition 6.2.1 (충분통계량에 대한 정의)

A statistic $T(X)$ is sufficient statistic for $\theta$ if the conditional distribution of the sample $X$ given the value of $T(X)$ doesn't depend on $\theta$ .

$P_{\theta}(X=x|T(X)=T(x))= \frac{P_{\theta}(X=x \; and \;T(X)=T(x))}{P_{\theta}(T(X)=T(x))}=\frac{P_{\theta}(X=x)}{P_{\theta}(T(X)=T(x))}=\frac{p(x|\theta)}{q(T(x)|\theta)}$

Theorem 6.2.2

$p(\mathbf{x}\mid \theta)$가 $\mathbf{X}$의 joint pdf이고 $q(t\mid \theta)$가 $T(\mathbf{X})$의 pdf일 때, $p(\mathbf{x}\mid\theta)/q(T(\mathbf{x}\mid\theta)$가 $\theta$에 대한 상수라면, $T(\mathbf{X}$는 $\theta$에 대한 sufficient statistic이다.

definition 6.2.1에 따라 자연스럽게 알 수 있는 정리이다.

If $p(x|\theta)$ is joint pdf or pmf of X and $q(t|\theta)$ is pdf or pmf of $T(X)$ then, $T(x)$ is sufficient statisc for $\theta$ if for every $x$ , the ratio $\frac{p(x|\theta)}{q(T(x)|\theta)}$ is constant as a function of $\theta$

Example 6.2.3 (Binomial sufficient statistic)

Let $X_1,X_2,...X_n$ be iid Bernoulli random variable with parameter $\theta$, and $T(X)=X_1+X_2+...+X_n$. Then $T(X)$ is sufficient statistics for $\theta$.

즉, 주어진 sample들로부터 얻을 수 있는 $\theta$ 에 대한 정보는 $T(X)$ 로 충분하다. $X_3$의 정확한 값을 알더라도 $\theta$에 대한 추가적인 정보를 얻지 못한다.

Example 6.2.4 (Normal sufficient statistic)

마찬가지로 모평균 $\mu$ 에 대해 표본평균 $\bar{X}$ 는 충분통계량이다.

Example 6.2.3과 6.2.4는 모두 Theorem 6.2.2를 적용해 알 수 있다.

Example 6.2.5 (Sufficient order statistics)

$f(x) = \Pi^n_1 f(x_i)= \Pi^n_1 f(x_{(i)})$

따라서 추정하고자 하는 pdf f에 대해 order statistics와 x의 ratio가 \frac{1}{n!} 이 되므로(order가 strictly increasing인 경우) f에 대해 dependent하지 않게 된다. 따라서 order statistics는 f에 대한 sufficient statistics가 됨을 알 수 있다.

Theorem 6.2.6 (Factorization Thoerem)

Let $f(x|\theta)$ denote the joint pdf or pmf of a sample X. A statistics $T(X)$ is a sufficient statistics for $\theta$ if and only if there exists $g(t|\theta)$ and $h(x)$ s.t for all sample points $x$ and all parameter points $\theta$ ,

$f(x|\theta)=g(T(x)|\theta)h(x)$

Proof)

1. only if part : T(X) is sufficient statistics then it can be factorized

Use definition 6.2.1 which defines sufficiency : sample X given the value of T(X) doesn't depend on $\theta$

2. if part

use the fact that $q(T(x)|\theta) = \Sigma_y f(y|\theta)$ where $y\in \{T(y)= T(x\}$

Example 6.2.8 (Uniform sufficient statistics)

discrete uniform distriubtion으로부터의 iid observation $X_1,X_2,...,X_n$을 생각해보자

$( x_i\in\{1,2,...,\theta\})$

이는 $x_i\in\{1,2,...\}\quad and\quad max_i\;x\leq\theta$ 로 다시 표현할 수 있다. 이는 다시 쉽게 Factorization theorem에 맞게 표현할 수 있다.

Let $T(X) = max_i X_i$

Then, $g(t|\theta)=\theta^{-n}$ where$\;t\leq \theta$ otherwise 0

Therefore,by factorization theorem, the largest order statistics $T(X)=max_iX_i$ is a sufficient statistics.

It can be also expressed as Indicator Function $I_{N_{\theta}}$

Vector Valued Statistics

For example $(\bar{X},s^2)$ is sufficient statistics for $(\mu,\sigma^2)$.

Theorem 6.2.10

For exponential family, it is easy to find a sufficient statistic by Factorization Theorem.

$f(x|\theta)= h(x)c(\theta)exp(\Sigma^k_i w_i(\theta)t_i(x)))$

Then $T(X) = (\Sigma^n_j t_1(X_j),...,\Sigma^n_j t_k(X_j)$

저작자표시

'Statistics > Statistical Inference' 카테고리의 다른 글

Concentration Inequalities (0)	2021.03.31
Sample Space (0)	2021.03.05
Location and Scale Family (0)	2021.01.08
지수족 (0)	2021.01.07
Ancillary and Complete Statistics (0)	2021.01.06

Stareering

Sufficient Statistics : Factorization Theorem

Sufficiency Principle

Definition 6.2.1 (충분통계량에 대한 정의)

Theorem 6.2.2

Example 6.2.3 (Binomial sufficient statistic)

Example 6.2.4 (Normal sufficient statistic)

Example 6.2.5 (Sufficient order statistics)

Theorem 6.2.6 (Factorization Thoerem)

Example 6.2.8 (Uniform sufficient statistics)

Vector Valued Statistics

Theorem 6.2.10

'Statistics > Statistical Inference' 카테고리의 다른 글

티스토리툴바

Sufficient Statistics : Factorization Theorem

Sufficiency Principle

Definition 6.2.1 (충분통계량에 대한 정의)

Theorem 6.2.2

Example 6.2.3 (Binomial sufficient statistic)

Example 6.2.4 (Normal sufficient statistic)

Example 6.2.5 (Sufficient order statistics)

Theorem 6.2.6 (Factorization Thoerem)

Example 6.2.8 (Uniform sufficient statistics)

Vector Valued Statistics

Theorem 6.2.10

'Statistics > Statistical Inference' 카테고리의 다른 글

'Statistics/Statistical Inference' Related Articles

티스토리툴바