Figure 2.1. An example of multiple imputation.
Figure 2.2. Scatter plots of the coefficient estimates from the analysis of the complete data (x-axis) and the meanof the coefficient estimates from the analysis of multiple imputation data (y-axis).
Figure 3.1. An example of generating fully synthetic data.
Table 2.1. Sequential regression multivariate imputation simulation results
References
- Abowd, J. M., Kramarz, F., and Margolis, D. N. (1999). High wage workers and high wage firms, Econometrica, 67, 251-333. https://doi.org/10.1111/1468-0262.00020
- Abowd, J. M. and Woodcock, S. D. (2001). Disclosure limitation in longitudinal linked data. In P. Doyle, J. Lane, J. Theeuwes, L. Zayatz (Eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (pp. 215-277), Amsterdam, North Holland.
- Clyde, M. A. and Lee, H. K. H. (2001). Bagging and the Bayesian bootstrap. In T. Richardson and T. Jaakkola (Eds) Artificial Intelligence and Statistics (pp. 169-174), Morgan Kaufmann, Burlington.
- Drechsler, J. (2018). Some clarifications regrading fully synthetic data. In Domingo-Ferrer, J., Montes, F. (eds.) LNCS, (Vol. 11126, pp. 109-121), Springer, Heidelberg.
- Efron, B. (1979). Bootstrap methods: another look at the jackknife, Annals of Statistics, 7, 1-26. https://doi.org/10.1214/aos/1176344552
- Little, R. J. A. (1993). Statistical analysis of masked data, Journal of Official Statistics, 9, 407-426.
- Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., and Vilhuber, L. (2008). Privacy: theory meets practice on the map. In Proceedings of the 24th International Conference on Data Engineering, 277-286.
- Park, M. J. (2016). Comparative study on the recent SDC methods. Statistical Research Institute.
- Park, M. J. and Kim, H. (2016). Statistical disclosure control for public microdata: present and future, Korean Journal of Applied Statistics, 39, 1041-1059. https://doi.org/10.5351/KJAS.2016.29.6.1041
- Park, M. J. and Kim, J. (2017). Reveiw on the synthetic data generation methodologies. Statistical Research Institute.
- Raab, G. M., Nowork, B., and Dibben, C. (2017). Practical data synthesis for large samples. Journal of Privacy and Confidentiality, 7, 67-97.
- Raghunathan, T. E., Lepkowski, J. M., Hoewyk, J. V., and Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models, Statistics Canada, 27, 85-95.
- Raghunathan, T. E., Reiter, J. P., and Rubin, D. B. (2003). Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19, 1-16.
- Reiter, J. P. (2002). Satisfying disclosure restrictions with synthetic data sets, Journal of Official Statistics, 18, 531-543.
- Reiter, J. P. (2003). Inference for partially synthetic, public use microdata sets, Survey Methodology, 29, 181-188.
- Reiter, J. P. (2004). Significance tests for multi-component estimands from multiply imputed, synthetic microdata, Journal of Statistical Planning and Inference, 131, 365-377. https://doi.org/10.1016/j.jspi.2004.02.003
- Rubin, D. B. (1978). Multiple imputations in sample surveys - a phenomenological Bayesian approach to nonresponse. In Proceedings of the Survey Research Methods Section, American Statistical Association, 20-34.
- Rubin, D. B. (1981). The Bayesian bootstrap, Annals of Statistics, 9, 130-134. https://doi.org/10.1214/aos/1176345338
- Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys, John Wiley & Sons, New York.
- Rubin, D. B. (1988). An overview of multiple imputation. In Proceedings of the Survey Research Section, American Statistical Association, 79-84.
- Rubin, D. B. (1993). Discussion statistical disclosure limitation, Journal of Official Statistics, 9, 461-468.