DOI QR코드

DOI QR Code

Sensitivity analysis of missing mechanisms for the 19th Korean presidential election poll survey

19대 대선 여론조사에서 무응답 메카니즘의 민감도 분석

  • Kim, Seongyong (Division of Big Data and Management Engineering, Hoseo University) ;
  • Kwak, Dongho (Division of Big Data and Management Engineering, Hoseo University)
  • 김성용 (호서대학교 빅데이터경영공학부) ;
  • 곽동호 (호서대학교 빅데이터경영공학부)
  • Received : 2018.09.14
  • Accepted : 2018.12.03
  • Published : 2019.02.28

Abstract

Categorical data with non-responses are frequently observed in election poll surveys, and can be represented by incomplete contingency tables. To estimate supporting rates of candidates, the identification of the missing mechanism should be pre-determined because the estimates of non-responses can be changed depending on the assumed missing mechanism. However, it has been shown that it is not possible to identify the missing mechanism when using observed data. To overcome this problem, sensitivity analysis has been suggested. The previously proposed sensitivity analysis can be applicable only to two-way incomplete contingency tables with binary variables. The previous sensitivity analysis is inappropriate to use since more than two of the factors such as region, gender, and age are usually considered in election poll surveys. In this paper, sensitivity analysis suitable to an multi-dimensional incomplete contingency table is devised, and also applied to the 19th Korean presidential election poll survey data. As a result, the intervals of estimates from the sensitivity analysis include actual results as well as estimates from various missing mechanisms. In addition, the properties of the missing mechanism that produce estimates nearest to actual election results are investigated.

선거여론조사 자료의 경우 무응답이 흔히 관측되며, 이와 같이 무응답이 존재하는 범주형 자료는 불완전 분할표로 표현된다. 불완전 분할표로 표현된 선거여론조사 자료에서 후보자 지지율을 추정하는 경우, 지지율은 무응답이 어떤 메카니즘을 따르는가에 따라 다르게 추정되며, 따라서 자료가 어떠한 무응답 메카니즘을 따르는지에 대한 판별이 분석에 선행되어야 한다. 그러나 최근 연구에 따르면, 관측된 자료를 이용해서는 무응답 메카니즘을 판별할 수 없음이 밝혀졌다. 이러한 문제를 해결하기 위해 다양한 무응답 메카니즘을 반영할 수 있는 민감도 분석이 제안되었다. 그러나 기존에 제안된 민감도 분석의 경우, 이원 분할표에서 각 변수의 범주 수가 두 개인 경우만을 대상으로 한다. 우리나라 선거여론조사에서 고려되는 요인이 지역, 성, 연령 등임을 감안할 때, 기존 방법론으로 민감도 분석을 시행하기에는 한계점이 존재한다. 이에 따라 본 논문에서는 기존의 민감도 분석을 다차원 불완전 분할표에 적용할 수 있도록 확장하고, 이를 우리나라 19대 대선 여론조사 자료에 적용하였다. 분석 결과, 민감도 분석의 구간이 실제 지지율을 포함하고 있을 뿐 아니라, 다양한 무응답 메카니즘의 결과를 포괄하고 있으며, 실제 지지율과 가장 가까운 예측치의 경우 후보자에 대한 지지가 무응답의 발생에 영향을 미침을 알 수 있었다.

Keywords

GCGHDE_2019_v32n1_29_f0001.png 이미지

Figure 4.1. Sensitivity plots.

GCGHDE_2019_v32n1_29_f0002.png 이미지

Figure 4.2. Sensitivity plots sliced by ${\beta}_{jl{\mid}i}$.

Table 2.1. Survey result for the 19th presidential election in Seoul and Incheon/Gyeonggi-do

GCGHDE_2019_v32n1_29_t0001.png 이미지

Table 3.1. Nonresponse models

GCGHDE_2019_v32n1_29_t0002.png 이미지

Table 3.2. Estimated cell counts in Seoul

GCGHDE_2019_v32n1_29_t0003.png 이미지

Table 4.1. Sensitivity analysis of Moon

GCGHDE_2019_v32n1_29_t0004.png 이미지

Table 4.2. The estimates of MNAR model nearest to the election result

GCGHDE_2019_v32n1_29_t0005.png 이미지

References

  1. Agresti, A. (2002). Categorical Data Analysis (2nd Ed), Wiley & Sons, New York.
  2. Baker, S. G., Ko, C., and Graubard, B. I. (2003). A sensitivity analysis for nonrandomly missing categorical data arising from a national health disability survey, Biostatistics, 4, 41-56. https://doi.org/10.1093/biostatistics/4.1.41
  3. Baker, S. G. and Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse, Journal of American Statistical Association, 83, 62-69. https://doi.org/10.1080/01621459.1988.10478565
  4. Baker, S. G., Rosenberger, W. F., and Dersimonian, R. (1992). Closed-form estimates for missing counts in two-way contingency tables, Statistics in Medicine, 11, 643-657. https://doi.org/10.1002/sim.4780110509
  5. Choi, B. S., Choi, J. W., and Park, Y. (2009). Bayesian methods for an incomplete two-way contingency table with application to the Ohio (Buckeye State) Polls, Survey Methodology, 35, 37-51.
  6. Clarke, P. S. (2002). On boundary solutions and identifiability in categorical regression with non-ignorable non-response, Biometrical Journal, 44, 701-717. https://doi.org/10.1002/1521-4036(200209)44:6<701::AID-BIMJ701>3.0.CO;2-1
  7. Fay, R. E. (1986). Causal models for patterns of nonresponse, Journal of American Statistical Association, 81, 354-365. https://doi.org/10.1080/01621459.1986.10478279
  8. Forster, J. J. and Smith, P. W. F. (1998). Model-based inference for categorical survey data subject to nonignorable nonresponse, Journal of the Royal Statistical Society: Series B, 60, 57-70. https://doi.org/10.1111/1467-9868.00108
  9. Kim, S. (2016). Sensitivity analysis for uncertainty of missing mechanisms in an incomplete contingency table, Journal of the Korean Data Analysis Society, 18, 1845-1855.
  10. Kim, S. and Kim, D. (2018). Assessment of nonignorable log-linear models for an incomplete contingency table, Statistica Sinica, 28, 1887-1905.
  11. Kim, S., Park, Y., and Kim, D. (2015). On missing-at-random mechanism in two-way incomplete contingency tables, Statistics and Probability Letters, 96, 196-203. https://doi.org/10.1016/j.spl.2014.09.026
  12. Little, J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd Edition, Wiley & Sons, New York.
  13. Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data, Journal of American Statistical Association, 88, 125-134. https://doi.org/10.2307/2290705
  14. Molenberghs, G., Beunckens, C., Sotto, C., and Kenward, M. G. (2008). Every missingness not at random model has a missingness at random counterpart with equal fit, Journal of the Royal Statistical Society: Series B, 70, 371-388. https://doi.org/10.1111/j.1467-9868.2007.00640.x
  15. Molenberghs, G., Kenward, M. G., and Goetghebeur, E. (2001). Sensitivity analysis for incomplete contingency tables: the Slovenian plebiscite case, Journal of the Royal Statistical Society: Series C, 50, 15-29. https://doi.org/10.1111/1467-9876.00217
  16. Poleto, F. Z., Singer, J. M., and Paulino, C. D. (2011). Missing data mechanisms and their implications on the analysis of categorical data, Statistics and Computing, 21, 31-43. https://doi.org/10.1007/s11222-009-9143-x
  17. Vansteelandt, S., Goetghebeur, E., Kenward, M. G., and Molenberghs, G. (2006). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis, Statistica Sinica, 16, 953-979.