A Study on the efficiency of the MCMC multiple imputation In LDA

선형판별분석에서 MCMC다중대체법의 효율에 관한 연구

  • Yoo, Hee-Kyung (Dept. of computer engineering Kangwon National University) ;
  • Kim, Myung-Cheol (Dept. of Industrial & Management Engineering Kangwon National University)
  • 유희경 (강원대학교 삼척캠퍼스 컴퓨터공학과) ;
  • 김명철 (강원대학교 삼척캠퍼스 산업경영공학과)
  • Published : 2009.09.30

Abstract

This thesis studies two imputation methods, the MCMC method and the EM algorithm, that take care of the problem. The performance of the two methods for the linear (or quadratic) discriminant analysis are evaluated under various types of incomplete observations. Based on simulated experiments, the effect of the imputation using the EM algorithm and the MCMC method are evaluated and compared in terms of the probability of misclassification and the RMSE. This is done for the various cases of incomplete observations. The cases are differentiated by missing rates, sample sizes, and distances between two classification groups. The studies show that the probability of misclassification and the RMSE of the EM algorithm method is lower than the MCMC method. Therefore the imputation using the EM algorithm is more efficient than the MCMC method. And the probability of misclassification of the method that all vectors of observations with missing values are omitted from analysis is lower than the EM algorithm and the MCMC method when the samples size is small and the rate of missing values is extremely big.

Keywords

References

  1. Chan, L.S., Dunn, O.J., 'The treatment of missing values in discriminant analysis', Journal of the American Statistical Association, 67 (1972) : 473-477 https://doi.org/10.2307/2284409
  2. Dempster A.P., Laird N.M., Rubin D.B., 'Maximum likelihood from incomplete data via the EM algorithm', Journal of the Royal Statistical Society. Series B, 39 (1977) : 1-38
  3. Johnson R.A., Wichernd D.W., Applied multivariate statistical analysis, Prentice Hall.(2007)
  4. Lachenbrush,P.A. and Mickey,M.A., 'Estimation of Error Rates in Discriminant Analysis', Technometrics, 10 (1968) : 1-10 https://doi.org/10.2307/1266219
  5. Rubin, D.B., Multiple imputation for nonresponse in surveys, New York, Wiely. (1987)
  6. Schafer, J.L., Analysis of imcomplete multivariate data, Chapman and Hall, London. (1997)
  7. Schafer, J.L., 'Multiple Imputation : A primer, Statistical Method in Medical Research', 8 (1999) : 3-15 https://doi.org/10.1177/096228029900800102