DOI QR코드

DOI QR Code

On the Use of Weighted k-Nearest Neighbors for Missing Value Imputation

Weighted k-Nearest Neighbors를 이용한 결측치 대치

  • Lim, Chanhui (Department of Biomedicine.Health Science, The Catholic University of Korea) ;
  • Kim, Dongjae (Department of Biomedicine.Health Science, The Catholic University of Korea)
  • 임찬희 (가톨릭대학교 의생명.건강과학과) ;
  • 김동재 (가톨릭대학교 의생명.건강과학과)
  • Received : 2014.10.06
  • Accepted : 2014.11.10
  • Published : 2015.02.28

Abstract

A conventional missing value problem in the statistical analysis k-Nearest Neighbor(KNN) method are used for a simple imputation method. When one of the k-nearest neighbors is an extreme value or outlier, the KNN method can create a bias. In this paper, we propose a Weighted k-Nearest Neighbors(WKNN) imputation method that can supplement KNN's faults. A Monte-Carlo simulation study is also adapted to compare the WKNN method and KNN method using real data set.

통계적 분석을 할 때 결측치가 발생하는 것은 매우 통상적이다. 이러한 결측치를 대치하는 방법은 여러가지가 있으며, 기존에 사용되는 단일대치법으로 k-nearest neighbor(KNN) 방법이 있다. 하지만 KNN 방법은 k개의 최근접 이웃들 중 극단치나 이상치가 있을 때 편의를 일으킬 수 있다. 본 논문에서는 KNN 방법의 단점을 보완하여 가중 k-최근접이웃(Weighted k-Nearest Neighbors; WKNN) 대치법을 제안하였다. 또한 모의실험을 통해서 기존의 방법과 비교하였다.

Keywords

References

  1. Dixon, J. K. (1979). Pattern recognition with partly missing data, IEEE Transactions on Systems, Man, and Cybernetics, 9, 617-621. https://doi.org/10.1109/TSMC.1979.4310090
  2. Jang, H. J. (2004). On the use of clustering method for missing value imputation, Korea University, M.S. Thesis.
  3. Jhun, M. S., Jeong, H. C. and Koo, J. Y. (2007). On the use of adaptive nearest neighbors for missing value imputation, Communications in Statistics: Simulation and Computation, 36, 1275-1286. https://doi.org/10.1080/03610910701569069
  4. Kang, S. H. (2013). Medical Statistics Needed for Drug Development, 2nd ed., Freeca.
  5. Kim, H. K. (2010). A study on statistical matching technique using the weighted k-nearest neighbor method, Dongguk University: Ph.D. thesis.
  6. Park, S. H., Bang, S. W. and Jhun, M. S. (2011). On the use of sequential adaptive nearest neighbors for missing value imputation, The Korean Journal of Applied Statistics, 24, 1249-1257. https://doi.org/10.5351/KJAS.2011.24.6.1249
  7. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525. https://doi.org/10.1093/bioinformatics/17.6.520
  8. Yun, S. C. (2004). Imputation of missing values, Journal of Preventive Medicine and Public Health, 37, 209-211.