Search | Korea Science

Baek, Jangsun
- Journal of the Korean Statistical Society
- /
- v.27 no.4
- /
- pp.515-529
- /
- 1998
Burman (1987) and Hall and Titterington (1987) studied kernel smoothing for sparse multinomial data in detail. Both of their estimators for cell probabilities are sparse asymptotic consistent under some restrictive conditions on the true cell probabilities. Dong and Simonoff (1994) adopted boundary kernels to relieve the restrictive conditions. We propose a local linear kernel estimator which is popular in nonparametric regression to estimate cell probabilities. No boundary adjustment is necessary for this estimator since it adapts automatically to estimation at the boundaries. It is shown that our estimator attains the optimal rate of convergence in mean sum of squared error under sparseness. Some simulation results and a real data application are presented to see the performance of the estimator.
PDF

Baek, Jang-Sun
- Journal of the Korean Statistical Society
- /
- v.33 no.3
- /
- pp.313-321
- /
- 2004
We consider the problem of testing cell probabilities in sparse multinomial data. Aerts et al. (2000) presented T=${{\Sigma}_{i=1}}^{k}{[{p_i}^{*}-E{(p_{i}}^{*})]^2$ as a test statistic with the local least square polynomial estimator ${{p}_{i}}^{*}$, and derived its asymptotic distribution. The local least square estimator may produce negative estimates for cell probabilities. The local maximum likelihood polynomial estimator ${{\hat{p}}_{i}}$, however, guarantees positive estimates for cell probabilities and has the same asymptotic performance as the local least square estimator (Baek and Park, 2003). When there are cell probabilities with relatively much different sizes, the same contribution of the difference between the estimator and the hypothetical probability at each cell in their test statistic would not be proper to measure the total goodness-of-fit. We consider a Pearson type of goodness-of-fit test statistic, $T_1={{\Sigma}_{i=1}}^{k}{[{p_i}^{*}-E{(p_{i}}^{*})]^2/p_{i}$ instead, and show it follows an asymptotic normal distribution. Also we investigate the asymptotic normality of $T_2={{\Sigma}_{i=1}}^{k}{[{p_i}^{*}-E{(p_{i}}^{*})]^2/p_{i}$ where the minimum expected cell frequency is very small.
PDF KSCI

Baek, Jang-Sun
- Journal of the Korean Data and Information Science Society
- /
- v.14 no.2
- /
- pp.303-311
- /
- 2003
We consider the problem of testing cell probabilities in sparse multinomial data. Aerts, et al.(2000) presented $T_1=\sum\limits_{i=1}^k(\hat{p}_i-p_i)^2$ as a test statistic with the local polynomial estimator $(\hat{p}_i$, and showed its asymptotic distribution. When there are cell probabilities with relatively much different sizes, the same contribution of the difference between the estimator and the hypothetical probability at each cell in their test statistic would not be proper to measure the total goodness-of-fit. We consider a Pearson type of goodness-of-fit test statistic, $T=\sum\limits_{i=1}^k(\hat{p}_i-p_i)^2/p_i$ instead, and show it follows an asymptotic normal distribution.
PDF

Baek, Jang-Seon
- Proceedings of the Korean Statistical Society Conference
- /
- 2002.05a
- /
- pp.29-34
- /
- 2002
$p=(p_{}1,p_{2},{\cdots},p_{k})^{T}$의 확률벡터를 가진 다항분포로부터 관측된 칸 돗수(cell frequency) 벡터가 $N=(N_{1},N_{2},{\cdots},N_{k})^{T}$이며 ${\sum}{\limits}_{j=1}^{k}N_{j}=n$이라 하자. 총돗수 n이 칸의 총갯수 k에 비하여 상대적으로 매우 작을 때 이러한 이산형 자료를 희박다항분포자료(sparse multinomial data)라 한다. 이러한 희박다항분포자료의 칸들이 순서화 되어 있을 때 우리는 i번째 칸의 확률 $p_{i}$를 돗수 추정량 $N_{j}/n$ 들을 평활함으로써 추정 할 수 있다. Aerts, et al.(1997)과 Baek(1998) 등에 의해 제안된 국소최소제곱기준에 근거한 국소다항커널추정량은 희박점근일치성의 좋은 성질을 가짐에도 불구하고 확률추정지가 음수값을 가질 수 있는 단점을 내포하고 있다. 본 연구에서는 이러한 단점을 극복하기 위하여 국소최대우도 기준에 근거한 새로운 커널추정량을 제안하고, 그것의 점근적 성질을 연구하였다.
PDF