DOI QR코드

DOI QR Code

Pre-processing Method of Raw Data Based on Ontology for Machine Learning

머신러닝을 위한 온톨로지 기반의 Raw Data 전처리 기법

  • Hwang, Chi-Gon (Department of Computer Engineering, Institute of Information Technology) ;
  • Yoon, Chang-Pyo (Department Of Computer & Mobile Convergence, GyeongGi University of Science and Technology)
  • Received : 2020.04.16
  • Accepted : 2020.05.09
  • Published : 2020.05.31

Abstract

Machine learning constructs an objective function from learning data, and predicts the result of the data generated by checking the objective function through test data. In machine learning, input data is subjected to a normalisation process through a preprocessing. In the case of numerical data, normalization is standardized by using the average and standard deviation of the input data. In the case of nominal data, which is non-numerical data, it is converted into a one-hot code form. However, this preprocessing alone cannot solve the problem. For this reason, we propose a method that uses ontology to normalize input data in this paper. The test data for this uses the received signal strength indicator (RSSI) value of the Wi-Fi device collected from the mobile device. These data are solved through ontology because they includes noise and heterogeneous problems.

머신러닝은 학습 데이터로부터 목적함수를 구성하고, 테스트 데이터를 통해 목적함수의 확인함으로써 발생하는 데이터에 대한 예측을 수행한다. 머신러닝에서 입력데이터는 전처리 과정을 통해 정규화 과정을 거친다. 이런 정규화는 입력데이터의 평균과 표준편차를 이용하여 표준화하거나, 수치 데이터가 아닌 nominal value는 one-hot 코드 형태로 변환하는 방식을 이용한다. 그러나 이 전처리 과정만으로 문제를 해결할 수 없다. 이러한 이유로 본 논문에서 입력데이터의 정규화를 위해 온톨로지를 이용하는 방법을 제안한다. 이를 위한 테스트 데이터는 모바일 기기로부터 수집된 와이파이 장치의 RSSI값을 이용하고, 수집된 데이터의 노이즈와 이질적 문제는 온톨로지를 이용하여 정제하는 방법을 제시한다.

Keywords

References

  1. L.C.Navarro, A.K.W. Navarro, A.Gregio, A.Rocha, and R.Dahab, "Leveraging ontologies and machine-learning techniques for malware analysis into Android permissions ecosystems," Computers & Security, vol. 78, pp. 429-453, Sep. 2018. https://doi.org/10.1016/j.cose.2018.07.013
  2. X.Wang, and H.C.Kim, "Text Categorization with Improved Deep Learning Methods," Journal of information and communication convergence engineering, vol. 16, no.2, pp.106-113, 2018. https://doi.org/10.6109/JICCE.2018.16.2.106
  3. H. Xiong, G. Pandey, and M. Steinbach, V. Kumar, "Enhancing data analysis with noise removal," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp.304-319, 2006. https://doi.org/10.1109/TKDE.2006.46
  4. A. H. Doan, J. Madhavan, P. Domingos, and A. Halevy, "Ontology matching: A machine learning approach," In Handbook on Ontologies. Springer, Berlin, Heidelberg. pp. 385-403, 2004.
  5. T.R.Gruber, "Towards principles for the design of ontologies used for knowledge sharing," International Journal of Human-Computer Studies, vol. 43, no. 5, pp.1-2, 1995. https://doi.org/10.1006/ijhc.1995.1081
  6. C. P. Yoon, and C. G. Hwang, "Efficient indoor positioning systems for indoor location-based service provider," Journal of the Korea Institute of Information and Communication Engineering, vol.19, no.6, pp.1368-1373, 2015. https://doi.org/10.6109/jkiice.2015.19.6.1368
  7. M. S. Devi, and H. Mittal, "Machine learning techniques with ontology for subjective answer evaluation," International Journal on Natural Language Computing (IJNLC), vol. 5, no. 2, 2016.
  8. B. Amann, C. Beeri, I. Fundulaki, and M. Scholl, "Ontology-Based Integration of XML Web Resources," In International Semantic Web Conference (ISWC), pp.117-131, 2002.
  9. Architecture Committee, "Semantic Web Services Architecture requirements," Working draft, Version 1.0, [Internet]. Available: http://www.daml.org/services/swsa/swsa-requirements.html, 2004.
  10. Y.Bengio, "Learning deep architectures for AI," Foundations and trends in Machine Learning, vol.2, no.1, pp.1-127, 2009. https://doi.org/10.1561/2200000006
  11. A.Singh, "Foundations of Machine Learning," Available at SSRN 3399990, 2019.
  12. T. Miyato, S. Maeda, M. Koyama, and S. Ishii, "Virtual adversarial training: a regularization method for supervised and semi-supervised learning," IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 8, pp.1979-1993, 2018. https://doi.org/10.1109/tpami.2018.2858821