DOI QR코드

DOI QR Code

Density Adaptive Grid-based k-Nearest Neighbor Regression Model for Large Dataset

대용량 자료에 대한 밀도 적응 격자 기반의 k-NN 회귀 모형

  • Liu, Yiqi (College of Business Administration, Dongguk University) ;
  • Uk, Jung (College of Business Administration, Dongguk University)
  • 유의기 (동국대학교 경영학부) ;
  • 정욱 (동국대학교 경영학부)
  • Received : 2021.06.08
  • Accepted : 2021.06.14
  • Published : 2021.06.30

Abstract

Purpose: This paper proposes a density adaptive grid algorithm for the k-NN regression model to reduce the computation time for large datasets without significant prediction accuracy loss. Methods: The proposed method utilizes the concept of the grid with centroid to reduce the number of reference data points so that the required computation time is much reduced. Since the grid generation process in this paper is based on quantiles of original variables, the proposed method can fully reflect the density information of the original reference data set. Results: Using five real-life datasets, the proposed k-NN regression model is compared with the original k-NN regression model. The results show that the proposed density adaptive grid-based k-NN regression model is superior to the original k-NN regression in terms of data reduction ratio and time efficiency ratio, and provides a similar prediction error if the appropriate number of grids is selected. Conclusion: The proposed density adaptive grid algorithm for the k-NN regression model is a simple and effective model which can help avoid a large loss of prediction accuracy with faster execution speed and fewer memory requirements during the testing phase.

Keywords

Acknowledgement

This work was supported by the Dongguk University Research Fund of 2021.

References

  1. Alcala-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., Garcia, S., Sanchez, L. and Herrera, F. 2011. Keel Data-mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic & Soft Computing 17.
  2. Al-Qahtani, F. H., and Crone, S.F. 2013, August. Multivariate k-nearest Neighbour Regression for Time Series Data-A Novel Algorithm for Forecasting UK Electricity Demand. In The 2013 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
  3. Arnaiz-Gonzalez, A., Blachnik, M., Kordos, M., and Garcia-Osorio, C. 2016. Fusion of Instance Selection Methods in Regression Tasks. Information Fusion 30:69-79. https://doi.org/10.1016/j.inffus.2015.12.002
  4. Bhanu, C. V. K., Sudheer, G., Radhakrishna, C., and Phanikanth, V. 2008, October. Day-Ahead Electricity Price Forecasting Using Wavelets and Weighted Nearest Neighborhood. In 2008 Joint International Conference on Power System Technology and IEEE Power India Conference (pp. 1-4). IEEE.
  5. Chen, P., Wu, S., Lin, J., Ko, F., Lo, H., and Wang, J. 2005, Virtual Metrology: A Solution for Wafer to Wafer Advanced Process Control, Proc. IEEE Int. Symp. on Semiconductor Manufacturing (ISSM 2005), 155-157.
  6. Chen, Y.-T., Yang, H.-C., and Cheng, F.-T. 2006, Multivariate Simulation Assessment for Virtual Metrology, Proc. IEEE Int. Conf. on Robotics and Automation(ICRA 2006), 1048-1053.
  7. Dimri, A.P., Joshi, P., and Ganju, A. 2008. Precipitation Forecast over Western Himalayas Using k-nearest Neighbour Method. International Journal of Climatology: A Journal of the Royal Meteorological Society 28(14): 1921-1931. https://doi.org/10.1002/joc.1687
  8. Eronen, A. J., and Klapuri, A. P. 2009. Music Tempo Estimation With k-NN Regression. IEEE Transactions on Audio, Speech, and Language Processing 18(1):50-57. https://doi.org/10.1109/TASL.2009.2023165
  9. Fernandez-Rodriguez, F., Sosvilla-Rivero, S., and Andrada-Felix, J. 1999. Exchange-Rate Forecasts with Simultaneous Nearest-Neighbour Methods: Evidence from the EMS. International Journal of Forecasting 15(4):383-392. https://doi.org/10.1016/S0169-2070(99)00003-5
  10. Guillen, A., Herrera, L. J., Rubio, G., Pomares, H., Lendasse, A., and Rojas, I. 2010. New Method for Instance or Prototype Selection Using Mutual Information in Time Series Prediction. Neurocomputing 73(10-12): 2030-2038. https://doi.org/10.1016/j.neucom.2009.11.031
  11. Hastie, T., Tibshirani, R., and Friedman, J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.
  12. Jayawardena, A. W., Li, W. K., and Xu, P. 2002. Neighbourhood Selection for Local Modelling and Prediction of Hydrological Time Series. Journal of Hydrology 258(1-4):40-57. https://doi.org/10.1016/S0022-1694(01)00557-1
  13. Khan, A. A., Moyne, J. R., and Tilbury, D. M. 2007, An Approach for Factory-wide Control Utilizing Virtual Metrology, IEEE Transactions on Semiconductor Manufacturing 20(4):364-375. https://doi.org/10.1109/TSM.2007.907609
  14. Lin, T.-H., Hung, M.-T., Lin, R.-C., and Cheng, F.-T. 2006, A Virtual Metrology Scheme for Predicting CVD Thickness in Semiconductor Manufacturing, Proc. IEEE Int. Conf. on Robotics and Automation (ICRA 2006), 1054-1059.
  15. Lora, A. T., Riquelme, J. C., Ramos, J. L. M., Santos, J. M. R., and Exposito, A. G. 2003, December. Influence of kNN-Based Load Forecasting Errors on Optimal Energy Production. In Portuguese Conference on Artificial Intelligence (pp. 189-203). Springer, Berlin, Heidelberg.
  16. Lora, A. T., Santos, J. M. R., Exposito, A. G., Ramos, J. L. M., and Santos, J. C. R. 2007. Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques. IEEE Transactions on Power Systems 22(3): 1294-1301. https://doi.org/10.1109/TPWRS.2007.901670
  17. Rodriguez-Fdez, I., Mucientes, M., and Bugarin, A. 2013, July. An Instance Selection Algorithm for Regression and its Application in Variance Reduction. In 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (pp. 1-8). IEEE.
  18. She, D., and Yang, X. 2010. A New Adaptive Local Linear Prediction Method and its Application in Hydrological Time Series. Mathematical Problems in Engineering, 2010.
  19. Song, Y., Liang, J., Lu, J., and Zhao, X. 2017. An Efficient Instance Selection Algorithm for k Nearest Neighbor Regression. Neurocomputing 251:26-34. https://doi.org/10.1016/j.neucom.2017.04.018
  20. Su, A.-J., Jeng, J.-C., Huang, H.-P., Yu, C.-C., Hung, S.-Y., and Chao, C.-K. 2007, Control Relevant Issues in Semiconductor Manufacturing: Overview with Some New Results, Control Engineering Practice 15(10): 1268-1279. https://doi.org/10.1016/j.conengprac.2006.11.003
  21. Yang, S. 2006, August. Regression Nearest Neighbor in Face Recognition. In 18th International Conference on Pattern Recognition (ICPR'06) (Vol. 3, pp. 515-518). IEEE.