DOI QR코드

DOI QR Code

On the use of weighted adaptive nearest neighbors for missing value imputation

가중 적응 최근접 이웃을 이용한 결측치 대치

  • Yum, Yunjin (Department of Biomedicine.Health Science, The Catholic University of Korea) ;
  • Kim, Dongjae (Department of Biomedicine.Health Science, The Catholic University of Korea)
  • 염윤진 (가톨릭대학교 의생명.건강과학과) ;
  • 김동재 (가톨릭대학교 의생명.건강과학과)
  • Received : 2018.06.04
  • Accepted : 2018.07.12
  • Published : 2018.08.31

Abstract

Widely used among the various single imputation methods is k-nearest neighbors (KNN) imputation due to its robustness even when a parametric model such as multivariate normality is not satisfied. We propose a weighted adaptive nearest neighbors imputation method that combines the adaptive nearest neighbors imputation method that accounts for the local features of the data in the KNN imputation method and weighted k-nearest neighbors method that are less sensitive to extreme value or outlier among k-nearest neighbors. We conducted a Monte Carlo simulation study to compare the performance of the proposed imputation method with previous imputation methods.

결측치를 대치하는 여러가지 단일대치법 중에서 다변량 정규성 등의 모수적 모형이 만족되지 않을 때에도 강건성(robustness)을 지니는 k-최근접 이웃 대치법(k-nearest neighbors; KNN)이 널리 활용된다. KNN대치법에서 자료의 국소적 특징을 반영한 적응 최근접 이웃(adaptive nearest neighbors; ANN) 대치법과 k개의 최근접 이웃들 중 극단값이나 이상값이 있는 경우 이들의 영향에 덜 민감한 가중 k-최근접 이웃(weighted KNN; WKNN) 대치법의 장점을 결합한 가중 적응 최근접 이웃(weighted ANN; WANN) 대치법을 제안하였다. 또한 모의실험을 통하여 기존의 방법들과 제안한 방법을 비교하였다.

Keywords

References

  1. Dixon, J. K. (1979). Pattern recognition with partly missing data, IEE Transactions on Systems, Man, and Cybernetics, 9, 617-621. https://doi.org/10.1109/TSMC.1979.4310090
  2. Jhun, M., Jeong, H., and Koo, J. (2007). On the use of adaptive nearest neighbors for missing value imputation, Communications in Statistics: Simulation and Computation, 36, 1275-1286. https://doi.org/10.1080/03610910701569069
  3. Kang, S. (2013). Medical Statistics Needed for Drug Development (2nd ed), Freeacademy, Seoul.
  4. Lim, C. and Kim, D. (2015). On the use of weighted k-nearest neighbors for missing value imputation, The Korean Journal of Applied Statistics, 28, 23-31. https://doi.org/10.5351/KJAS.2015.28.1.023
  5. Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data, Wiley, New York.
  6. Park, S., Bang, S., and Jhun, M. (2011). On the use of sequential adaptive nearest neighbors for missing value imputation, The Korean Journal of Applied Statistics, 24, 1249-1257. https://doi.org/10.5351/KJAS.2011.24.6.1249
  7. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. (2001). Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520-525. https://doi.org/10.1093/bioinformatics/17.6.520
  8. Yun, S. (2004). Imputation of missing values, Journal of Preventive Medicine and Public Health, 37, 209-211.