DOI QR코드

DOI QR Code

A Comparative Experiment on Dimensional Reduction Methods Applicable for Dissimilarity-Based Classifications

비유사도-기반 분류를 위한 차원 축소방법의 비교 실험

  • Kim, Sang-Woon (Department of Computer Engineering, Myongji University)
  • 김상운 (명지대학교 컴퓨터공학과)
  • Received : 2015.11.23
  • Accepted : 2016.02.12
  • Published : 2016.03.25

Abstract

This paper presents an empirical evaluation on dimensionality reduction strategies by which dissimilarity-based classifications (DBC) can be implemented efficiently. In DBC, classification is not based on feature measurements of individual objects (a set of attributes), but rather on a suitable dissimilarity measure among the individual objects (pair-wise object comparisons). One problem of DBC is the high dimensionality of the dissimilarity space when a lots of objects are treated. To address this issue, two kinds of solutions have been proposed in the literature: prototype selection (PS)-based methods and dimension reduction (DR)-based methods. In this paper, instead of utilizing the PS-based or DR-based methods, a way of performing DBC in Eigen spaces (ES) is considered and empirically compared. In ES-based DBC, classifications are performed as follows: first, a set of principal eigenvectors is extracted from the training data set using a principal component analysis; second, an Eigen space is expanded using a subset of the extracted and selected Eigen vectors; third, after measuring distances among the projected objects in the Eigen space using $l_p$-norms as the dissimilarity, classification is performed. The experimental results, which are obtained using the nearest neighbor rule with artificial and real-life benchmark data sets, demonstrate that when the dimensionality of the Eigen spaces has been selected appropriately, compared to the PS-based and DR-based methods, the performance of the ES-based DBC can be improved in terms of the classification accuracy.

이 논문에서는 비유사도-기반 분류(dissimilarity-based classifications: DBC)를 효율적으로 수행할 수 있는 차원 축소 방법들을 비교 평가한 실험 결과를 보고한다. DBC에선 분류를 위해 대상 물체를 측정한 결과 값들(특징 요소들의 집합)을 이용하는 대신에 각 대상 물체들 사이의 비유사도를 측정하여 분류한다. 현재 DBC와 관련된 이슈들 중의 하나는 대규모 데이터를 취급할 경우에 비유사도 공간의 차원이 고차원으로 되는 문제가 있다. 이 문제를 해결하기 위하여 현재 프로토타입 선택(prototype selection: PS)방법이나 차원 축소(dimension reduction: DR)방법을 이용하고 있다. PS는 전체 학습 데이터에서 프로토타입을 추출하여 비유사도 공간을 구성하는 방법이고, DR은 전체 학습 데이터로 먼저 비유사도 공간을 구성한 다음 이 공간의 차원을 축소하는 방법이다. 이 논문에서는 PS이나 DR 대신에, 학습 데이터에 대한 주성분 분석으로 적절한 차원의 고유 공간 (Eigen space: ES)을 구성한 다음, 이 고유 공간으로 매핑 된 벡터들 사이의 $l_p$-놈(norm) 거리를 비유사도 거리로 측정하여 이용하는 DBC를 제안한다. 인터넷에 공개된 인공 및 실세계 데이터를 이용하여 최 근방 이웃 분류규칙으로 ES에서 수행한 DBC의 분류 성능을 측정한 결과, 고유공간의 차원을 적절하게 선정하였을 경우 PS와 DR를 이용한 DBC보다 분류 성능이 더 향상되었음을 확인하였다.

Keywords

References

  1. S. -W. Kim and R. P. W. Duin, "On optimizing dissimilarity-based classifier using multi-level fusion strategies (in Korean)," Journal of the Institute of Electronics Engineers of Korea, vol. 45, no. CI-5, pp. 15-24, 2008.
  2. R. P. W. Duin, "The dissimilarity representation for finding universals from particulars by an anti-essentialist approach," Pattern Recognition Letters, vol. 64, pp. 37-43, 2015. https://doi.org/10.1016/j.patrec.2015.04.015
  3. J. Laub, V. Roth, J. M. Buhmann, and K.-R. Muller, "On the information and representation of non-Euclidean pairwise data," Pattern Recognition, vol. 39, pp. 1815-1826, 2006. https://doi.org/10.1016/j.patcog.2006.04.016
  4. S. -W. Kim and S. -H. Kim, "On optimizing dissimilarity-based classifications using a DTW and fusion strategies (in Korean)," Journal of the Institute of Electronics Engineers of Korea, vol. 47, no. CI-2, pp. 212-219, 2010.
  5. E. Pekalska, R. P. W. Duin, and P. Paclik, "Prototype selection for dissimilarity-based classifiers," Pattern Recognition, vol. 39, pp. 189-208, 2006. https://doi.org/10.1016/j.patcog.2005.06.012
  6. K. Riesen, V. Kilchherr, and H. Bunke, "Reducing the dimensionality of vector space embeddings of graphs," in Proc. of 5th Int'l Conf. on Machine Learning and Data Mining, vol. LNAI-4571 pp. 563-573, 2007.
  7. S. -W. Kim, "An empirical evaluation on dimensionality reduction schemes for dissimilarity -based classifications," Pattern Recognition Letters, vol. 32, pp. 816-823, 2011. https://doi.org/10.1016/j.patrec.2011.01.009
  8. S. -W. Kim, "Dissimilarity representations using lp-norms in Eigen spaces," in Proc. of the 2015 Int'l Conf. on Image Processing, Computer Vision, Pattern Recognition, Las Vegas, Nevada, CSREA Press, pp. 242-248, 2015.
  9. D. He and J. Wang, "Parallel computing of eigenvalue of doubly stochastic matrix," in Proc. of the 5th Int'l Conf. on Algorithms and Architecture for Parallel Processing, Beijing, China, pp. 355-358, 2002.
  10. R. P. W. Duin, M. Bicego, M. Orozco-Alzate, S. -W. Kim, and M. Loog, "Metric learning in dissimilarity space for improved nearest neighbor performance," in Proc. of S+SSPR2014, Joensuu, Finland, LNCS8621, pp. 183-192, 2014.
  11. N. Kwak, "Principal component analysis based on l1-norm maximization," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp. 1672-1680, 2008. https://doi.org/10.1109/TPAMI.2008.114