DOI QR코드

DOI QR Code

클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류

Prototype based Classification by Generating Multidimensional Spheres per Class Area

  • 심세용 (단국대학교 컴퓨터과학과) ;
  • 황두성 (단국대학교 운동의과학과)
  • Shim, Seyong (Dept. of Computer Science, Dankook University) ;
  • Hwang, Doosung (Dept. of Kinesiologic Medical Science & Computer Science, Dankook University)
  • 투고 : 2014.10.22
  • 심사 : 2015.01.26
  • 발행 : 2015.02.28

초록

본 논문에서는 최근접 이웃 규칙을 이용한 프로토타입 선택 기반 분류 학습을 제안하였다. 각 훈련 데이터가 대표하는 클래스 영역을 구(sphere)로 분할하는데 최근접 이웃 규칙을 적용시키며, 구의 내부는 동일 클래스 데이터들만 포함하도록 한다. 프로토타입은 구의 중심점이며 프로토타입의 반지름은 가장 인접한 다른 클래스 데이터와 가장 먼 동일 클래스 데이터의 중간 거리 값으로 결정한다. 그리고 전체 훈련 데이터를 대표하는 최소의 프로토타입 집합을 선택하기 위해 집합 덮개 최적화를 이용하여 프로토타입 선택 문제를 변형시켰다. 제안하는 프로토타입 선택 방법은 클래스 별 적용이 가능한 그리디 알고리즘으로 설계되었다. 제안하는 방법은 계산 복잡도가 높지 않으며, 대규모 훈련 데이터에 대한 병렬처리의 가능성이 높다. 프로토타입 기반 분류 학습은 선택된 프로토타입 집합을 새로운 훈련 데이터 집합으로 사용하고 최근접 이웃 규칙을 적용하여 테스트 데이터의 클래스를 예측한다. 실험에서 제안하는 프로토타입 기반 분류기는 최근접 이웃 학습, 베이지안 분류 학습과 다른 프로토타입 분류기에 비해 일반화 성능이 우수하였다.

In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data into spheres within which the data exist from the same class. Prototypes are the center of spheres and their radii are computed by the mid-point of the two distances to the farthest same class point and the nearest another class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that include all the training data. The proposed prototype selection method is based on a greedy algorithm that is applicable to the training data per class. The complexity of the proposed method is not complicated and the possibility of its parallel implementation is high. The prototype-based classification learning takes up the set of prototypes and predicts the class of test data by the nearest neighbor rule. In experiments, the generalization performance of our prototype classifier is superior to those of the nearest neighbor, Bayes classifier, and another prototype classifier.

키워드

참고문헌

  1. X. Wu et al., "The top ten algorithms in data mining," CRC Press, 2009.
  2. T. Hastie, R. Tibshirani, and J. Friedman, "The Elements of Statistical Learning: Data Mining," Inference, and Prediction, Springer Series in Statistics, 2001.
  3. J. Arturo Olvera-Lopez, J. Ariel Carrasco-Ochoa, J. Francisco Martinez Trinidad, and J. Kittler, "A review of instance selection methods," Artif. Intell. Rev Vol. 34, No. 2, pp. 133-143, Aug. 2010. https://doi.org/10.1007/s10462-010-9165-y
  4. S. Garcia, J. Derrac, J. Cano, and F. Herrera, "Prototype Selection for Nearest Neighbor Classification : Taxonomy and Empirical Study," IEEE Transactions on Pattern Analysis and Machone Intelligence, Vol. 34, No. 3, pp. 417-435, Mar. 2012. https://doi.org/10.1109/TPAMI.2011.142
  5. D. S. Hwang and D. W. Kim, "Near-boundary data selection for fast support vector machines," Malasian journal of Computer Science, Vol. 25(l), pp. 23-37, Mar. 2012
  6. F. Angiulli, "FastNearestNeighbor Condensation for Large Data Sets Classification," IEEE Transactions onKnowledge andData Engineering, Vol. 19, No. 11, pp. 1450-1464, Nov. 2007. https://doi.org/10.1109/TKDE.2007.190645
  7. D. R. Wilson, and T. R. Martinez, "Reduction Techniques for Instance-BasedLearning Algorithms," Machine Learning, Vol. 38, No. 3, pp. 257-286, Mar. 2000. https://doi.org/10.1023/A:1007626913721
  8. J. Bien and R. Tibshirani, "Prototype selection for interpretable classification," The Annuals of Applied Statistics Vol. 5, No. 4, pp. 2403-2424, Dec, 2011. https://doi.org/10.1214/11-AOAS495
  9. I. Takigawa, M. Kudo, and A. Nakamura, "Convex sets as prototypes for classifying patterns," Engineering Applications of Artificial Intelligence, Vol. 22, No. 1, pp.101-108, Feb. 2009. https://doi.org/10.1016/j.engappai.2008.05.012
  10. D. Marchette, "Class cover catch digraphs," Wiley Interdisciplinary Reviews : Computational Statistics Vol. 2, No. 2, pp. 171-177, Mar. 2010. https://doi.org/10.1002/wics.70
  11. R. Younsi, and A. Bagnall, "An efficient randomised sphere cover classifier," Int. J. of Data Mining, Modelling and Management, Vol. 4, No. 2, pp.156-171, Jan. 2012. https://doi.org/10.1504/IJDMMM.2012.046808
  12. GLPK, The GLPK Linear Programming Kit Package, https://www.gnu.org/software/glpk/
  13. Vijay V. Vazirani, "Approximation Algorithms," Springer, 2001.
  14. UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/
  15. The DELVE Manual, http://www.cs.utoronto.ca/-delve/
  16. Stalog project, http://www1.maths.leed.ac.uk/-charles/statlog/indexdos.html
  17. K. S. Kim and D. S. Hwang "Support Vector Machine Algorithm for Imbalanced Data Learning," Journal of the Korea Society of Computer and Information, Vol. 15, No. 7, pp. 11-17, July. 2010 https://doi.org/10.9708/jksci.2010.15.7.011