DOI QR코드

DOI QR Code

Imputation Method Using Local Linear Regression Based on Bidirectional k-nearest-components

  • Yonggeol, Lee (Division of Software Convergence, Hanshin University)
  • Received : 2022.10.02
  • Accepted : 2022.10.30
  • Published : 2023.03.31

Abstract

This paper proposes an imputation method using a bidirectional k-nearest components search based local linear regression method. The bidirectional k-nearest-components search method selects components in the dynamic range from the missing points. Unlike the existing methods, which use a fixed-size window, the proposed method can flexibly select adjacent components in an imputation problem. The weight values assigned to the components around the missing points are calculated using local linear regression. The local linear regression method is free from the rank problem in a matrix of dependent variables. In addition, it can calculate the weight values that reflect the data flow in a specific environment, such as a blackout. The original missing values were estimated from a linear combination of the components and their weights. Finally, the estimated value imputes the missing values. In the experimental results, the proposed method outperformed the existing methods when the error between the original data and imputation data was measured using MAE and RMSE.

Keywords

Acknowledgement

This work was supported by Hanshin University Research Grant.

References

  1. P. Bansal, P. Deshpande and S. Sarawagi, "Missing value imputation on multidimensional time series," arXiv preprint arXiv:2103.01600, Mar. 2021. DOI: 10.48550/arXiv.2103.01600.
  2. Y. Lee, and S. I. Choi, "Data restoration by linear estimation of the principal components from lossy data," IEEE Access, vol. 8, pp. 172244-172251, 2020. DOI: 10.1109/ACCESS.2020.3024809.
  3. T. Aittokallio, "Dealing with missing values in large-scale studies: microarray data imputation and beyond," Briefings in bioinformatics, vol. 11, no. 2, pp. 253-264, Mar. 2010. DOI: 10.1093/bib/bbp059.
  4. S. Zhang, J. Zhang, X. Zhu, Y. Qin, and C. Zhang, "Missing value imputation based on data clustering," Transactions on computational science I, vol. 4750, pp. 128-138, 2008. DOI: 10.1007/978-3-540-79299-4_7.
  5. M. Khayati, A. Lerner, Z. Tymchenko, and P. Cudre-Mauroux, "Mind the gap: an experimental evaluation of imputation of missing values techniques in time series," in Proceedings of the VLDB Endowment, vol. 13, no. 5, pp. 768-782, 2020. DOI: 10.14778/3377369.3377383.
  6. J. Shao and B. Zhong, "Last observation carry forward and last observation analysis," Statistics in medicine. Vol. 22, no. 15, pp. 2429-2441, Aug. 2003. DOI: 10.1002/sim.1519.
  7. A. R. T. Donders, G. J. Van Der Heijden, T. Stijnen, and K. G. Moons, "A gentle introduction to the imputation of missing values," Journal of clinical epidemiology, vol. 59, no. 10, pp. 1087-1091, Oct. 2006. DOI: 10.1016/j.jclinepi.2006.01.014.
  8. D. C. Howell, "The treatment of missing data," The Sage handbook of social science methodology, pp. 208, 2007.
  9. G. Kalton and D. Kasprzyk, "Imputing for missing survey responses," Proceedings of the section on survey research methods, American Statistical Association, vol. 22, p. 31, American Statistical Association, Cincinnati, 1982.
  10. J. K. Dixon, "Pattern recognition with partly missing data," IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 10, pp. 617-621, 1979. DOI: 10.1109/TSMC.1979.4310090.
  11. J. Fan, "Local linear regression smoothers and their minimax efficiencies," The annals of Statistics, vol. 21, no. 1, pp. 196-216, Mar. 1993. [Online] Available: https://www.jstor.org/stable/3035587. https://doi.org/10.1214/aos/1176349022
  12. L. Ayalew and H. Yamagishi, "The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan," Geomorphology, vol. 65, no. 1-2, pp. 15-31, Feb. 2005. DOI: 10.1016/j.geomorph.2004.06.010.
  13. S. I. Choi, G. M. Jeong, and C. Kim, "Classification of odorants in the vapor phase using composite features for a portable e-nose system," Sensors, vol. 12, no. 12, pp. 16182-16193, 2012. DOI: 10.3390/s121216182.
  14. D. Kahaner, C. Moler, and S. Nash, "Numerical methods and software," Prentice-Hall, 1989.