DOI QR코드

DOI QR Code

Bias corrected imputation method for non-ignorable non-response

무시할 수 없는 무응답에서 편향 보정을 이용한 무응답 대체

  • Lee, Min-Ha (Department of Statistics, Hankuk University of Foreign Studies) ;
  • Shin, Key-Il (Department of Statistics, Hankuk University of Foreign Studies)
  • 이민하 (한국외국어대학교 통계학과) ;
  • 신기일 (한국외국어대학교 통계학과)
  • Received : 2022.04.05
  • Accepted : 2022.06.07
  • Published : 2022.08.31

Abstract

Controlling the total survey error including sampling error and non-sampling error is very important in sampling design. Non-sampling error caused by non-response accounts for a large proportion of the total survey error. Many studies have been conducted to handle non-response properly. Recently, a lot of non-response imputation methods using machine learning technique and traditional statistical methods have been studied and practically used. Most imputation methods assume MCAR(missing completely at random) or MAR(missing at random) and few studies have been conducted focusing on MNAR (missing not at random) or NN(non-ignorable non-response) which cause bias and reduce the accuracy of imputation. In this study, we propose a non-response imputation method that can be applied to non-ignorable non-response. That is, we propose an imputation method to improve the accuracy of estimation by removing the bias caused by NN. In addition, the superiority of the proposed method is confirmed through small simulation studies.

표본오차와 비표본오차를 포함하는 총오차(total survey error)를 관리하는 것은 표본설계에서 매우 중요하다. 무응답으로 인해 발생한 비표본오차는 총오차에서 차지하는 비중이 매우 크며 이를 해결하는 방법인 무응답 대체에 관한 다수의 연구가 수행되었다. 최근 전통적 통계학 관련 기법에 추가하여 기계학습 관련 기법을 이용한 무응답 대체법이 다수 연구되고 실질적으로 사용되고 있다. 기존에 발표된 다수의 방법은 MCAR(missing completely at random) 또는 MAR(missing at random) 가정을 사용하고 있다. 그러나 관심변수에 영향을 받는 MNAR(missing not at random) 또는 무시할 수 없는 무응답(non-ignorable non-response; NN)은 편향을 발생시켜 대체 결과의 정확성을 크게 떨어뜨리지만 이에 관한 연구는 상대적으로 미미하다. 본 연구에서는 무시할 수 없는 무응답이 발생한 경우에 적용 가능한 무응답 대체법을 제안하였다. 특히 편향을 추정한 후 이를 제거하는 방법을 이용하여 무응답 대체 결과의 정확성을 향상하는 방법을 제안하였다. 또한, 모의실험을 이용하여 제안된 방법의 타당성을 확인하였다.

Keywords

Acknowledgement

이 연구는 2022년 한국외국어대학교 교내연구비 지원을 받아 수행되었음.

References

  1. Akande O, Li F, and Reiter J (2017). An empirical comparison of multiple imputation methods for categorical data, The American Statistician, 71, 162-170. https://doi.org/10.1080/00031305.2016.1277158
  2. Berge E, Kim JK, and Skinner C (2016). Imputation under informative sampling, Journal of Survey Statistics and Methodology, 4, 436-462. https://doi.org/10.1093/jssam/smw032
  3. Bethlehem J (2020). Working with response probabilities, Journal of Official Statistics, 36, 647-674. https://doi.org/10.2478/jos-2020-0033
  4. Chung HY and Shin KI (2017). Estimation using informative sampling technique when response rate follows exponential function of variable of interest, Korean Journal of Applied Statistics, 30, 993-1004. https://doi.org/10.5351/KJAS.2017.30.6.993
  5. Chung HY and Shin KI (2019). Bias adjusted estimation in a sample survey with linear response rate, Korean Journal of Applied Statistics, 32, 631-642. https://doi.org/10.5351/KJAS.2019.32.4.631
  6. Chung HY and Shin KI (2020). A study on non-response bias adjusted estimation in business survey, Korean Journal of Applied Statistics, 33, 11-23. https://doi.org/10.5351/KJAS.2020.33.1.011
  7. Chung HY and Shin KI (2022). A response probability estimation for non-ignorable non-response, Communications for Statistical Application and Methods, 29, 263-275. https://doi.org/10.29220/CSAM.2022.29.2.263
  8. Finch WH (2010). Imputation methods for missing categorical questionnaire, Journal of Data Science, 8, 361-378. https://doi.org/10.6339/JDS.2010.08(3).612
  9. Hong S and Lynn HS (2020). Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity and interaction, BMC Medical Research Methodology, 20:199.
  10. Iannacchinoe VG, Milne JG, and Folsom RE (1991). Response probability weight adjustment using logistic regression, Proceedings of the Survey Research Methods Section, American Statistical Association, 637-642.
  11. Laaksonen S (2016). Multiple imputation for a continuous variable, Journal of Mathematics and Statistical Science, 2016, 624-643.
  12. Lee H and Song J (2017), Comparison of imputation methods for item nonresponse in a panel study, The Korean Journal of Applied Statistics, 30, 377-390. https://doi.org/10.5351/KJAS.2017.30.3.377
  13. Lin WC and Tsai CF (2020). Missing value imputation: a review and analysis of the literature(2006-2017), Artificial Intelligence Review, 53, 1487-1509. https://doi.org/10.1007/s10462-019-09709-4
  14. Park KH, Cho H, and Song CJ (2015). Non-response imputation of the culture, sports and tourism labor force survey, Journal of the Korean Data Analysis Society, 17, 1969-1981.
  15. Quintero M and LeBoulluec A (2018). Missing data imputation for ordinal data, International Journal of Computer Applications, 81, 10-16. https://doi.org/10.5120/ijca2018917522
  16. Schmitt P, Mandel J, and Guedj M (2015). A comparison of six methods for missing data imputation, Journal of Biometrics & Biostatistics, 6:224.
  17. Song J (2014). A comparison of imputation methods for multiple response questions, Journal of the Korean Data Analysis Society, 16, 691-791.
  18. Sim JY and Shin KI (2021). Bias corrected non-response estimation using nonparametric function estimation of super population model, Korean Journal of Applied Statistics, 34, 923-936. https://doi.org/10.5351/KJAS.2021.34.6.923
  19. Pfeffermann D, Krieger AM, and Rinott Y (1998). Parametric distributions of complex survey data under informative probability sampling, Statistica Sinica, 8, 1087-1114.
  20. Pfeffermann D and SverchkovM(2003). Small area estimation under informative sampling, 2003 Joint Statistical Meeting-Section on Survey Research Methods, 3284-3295.
  21. Thomas T and Rajabi E (2021). A systematic review of machine learning-based missing value imputation techniques, Data Technologies and Applications, 55, 558-585. https://doi.org/10.1108/DTA-12-2020-0298