Feature Subset Selection Algorithm based on Entropy

엔트로피를 기반으로 한 특징 집합 선택 알고리즘

  • 홍석미 (경희대학교 컴퓨터공학과) ;
  • 안종일 (용인송담대학교 컴퓨터소프트웨어) ;
  • 정태충 (경희대학교 컴퓨터공학과)
  • Published : 2004.03.01

Abstract

The feature subset selection is used as a preprocessing step of a teaming algorithm. If collected data are irrelevant or redundant information, we can improve the performance of learning by removing these data before creating of the learning model. The feature subset selection can also reduce the search space and the storage requirement. This paper proposed a new feature subset selection algorithm that is using the heuristic function based on entropy to evaluate the performance of the abstracted feature subset and feature selection. The ACS algorithm was used as a search method. We could decrease a size of learning model and unnecessary calculating time by reducing the dimension of the feature that was used for learning.

특징 집합 선택은 학습 알고리즘의 전처리 과정으로 사용되기도 한다. 수집된 자료가 문제와 관련이 없다거나 중복된 정보를 갖고 있는 경우, 이를 학습 모델생성 이전에 제거함으로써 학습의 성능을 향상시킬 수 있다. 또한 탐색 공간을 감소시킬 수 있으며 저장 공간도 줄일 수 있다. 본 논문에서는 특징 집합의 추출과 추출된 특징 집합의 성능 평가를 위하여 엔트로피를 기반으로 한 휴리스틱 함수를 사용하는 새로운 특징 선택 알고리즘을 제안하였다. 탐색 방법으로는 ACS 알고리즘을 이용하였다. 그 결과 학습에 사용될 특징의 차원을 감소시킴으로써 학습 모델의 크기와 불필요한 계산 시간을 감소시킬 수 있었다.

Keywords

References

  1. M. A. Hall. Correlation-based Feature Selection for Machine Learning, Ph. D diss. Hamilton, NZ: Waikato University, Department of Computer Science
  2. G. H. John, R. Kohavi, and P. Pfleger. Irrelevant features and the subset selection problem, In Proc. of 11th Int'l Conf. On Machine Learning, p 121-129, San Mateo, CA, 1994, Morgan Kaufmann
  3. K. kira and L. A. Rendell. The feature selection problem: Traditional methods and a new algorithm. In 10th National Conference on Artificial Intelligence, p 129-134. MIT Press, 1992
  4. D. Opitz. Feature selection for ensemble. In 16th National Conf. on Artificial Intellignecr (AAAI), pp 379-384, Orlando, FL, 1999
  5. Y -S Kim, W. N. Street and F. Menczer. Meta-Evolutionary Ensembles. In Proc. 2002 Int'l Joint Conf. on Neural Networks(IJCNN -02), pp 2791-2796, 2002 https://doi.org/10.1109/IJCNN.2002.1007590
  6. R. Beckers, J. L. Deneubourg and S. Goss, Trails and U-turns in the selection of the shortest path bye the ant Lasius Niger, Journal of Theoretical Biology, vol. 159, p 397-415, 1992 https://doi.org/10.1016/S0022-5193(05)80686-1
  7. A. Colomi, M. Dorgio and V. Maniezzo, Distributed optimization by ant colonies, Proceedings of ECAL91-Euripean Conference on Artificial Life, Paris, France, F. Vardla and P. Bourgine(Eds.), Elsevier Publishing, p 134-142, 1991
  8. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-100, 1986
  9. Utgoff P. An improved algorithm for incremental induction of decision trees. In Proceedings of the Eleventh International Conference on Machine Learning, p 318-325, 1994
  10. 원동호 역, 정보와 부호이론, 도서출판 ohm, 1997
  11. UCI Repository of Machine Learning Data-bases.httpi//www.ics.uci.edu/~mleam/MLfiepository.html]