DOI QR코드

DOI QR Code

Performance Improvement of Feature Selection Methods based on Bio-Inspired Algorithms

생태계 모방 알고리즘 기반 특징 선택 방법의 성능 개선 방안

  • Published : 2008.08.29

Abstract

Feature Selection is one of methods to improve the classification accuracy of data in the field of machine learning. Many feature selection algorithms have been proposed and discussed for years. However, the problem of finding the optimal feature subset from full data still remains to be a difficult problem. Bio-inspired algorithms are well-known evolutionary algorithms based on the principles of behavior of organisms, and very useful methods to find the optimal solution in optimization problems. Bio-inspired algorithms are also used in the field of feature selection problems. So in this paper we proposed new improved bio-inspired algorithms for feature selection. We used well-known bio-inspired algorithms, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), to find the optimal subset of features that shows the best performance in classification accuracy. In addition, we modified the bio-inspired algorithms considering the prior importance (prior relevance) of each feature. We chose the mRMR method, which can measure the goodness of single feature, to set the prior importance of each feature. We modified the evolution operators of GA and PSO by using the prior importance of each feature. We verified the performance of the proposed methods by experiment with datasets. Feature selection methods using GA and PSO produced better performances in terms of the classification accuracy. The modified method with the prior importance demonstrated improved performances in terms of the evolution speed and the classification accuracy.

특징 선택은 기계 학습에서 분류의 성능을 높이기 위해 사용되는 방법이다. 여러 방법들이 개발되고 사용되어 오고 있으나, 전체 데이터에서 최적화된 특징 부분집합을 구성하는 문제는 여전히 어려운 문제로 남아있다. 생태계 모방 알고리즘은 생물체들의 행동 원리 등을 기반으로하여 만들어진 진화적 알고리즘으로, 최적화된 해를 찾는 문제에서 매우 유용하게 사용되는 방법이다. 특징 선택 문제에서도 생태계 모방 알고리즘을 이용한 해결방법들이 제시되어 오고 있으며, 이에 본 논문에서는 생태계 모방 알고리즘을 이용한 특징 선택 방법을 개선하는 방안을 제시한다. 이를 위해 잘 알려진 생태계 모방 알고리즘인 유전자 알고리즘(GA)과 파티클 집단 최적화 알고리즘(PSO)을 이용하여 데이터에서 가장분류 성능이 우수한 특징 부분집합을 만들어 내도록 하고, 최종적으로 개별 특징의 사전 중요도를 설정하여 생태계 모방 알고리즘을 개선하는 방법을 제안하였다. 이를 위해 개별 특징의 우수도를 구할 수 있는 mRMR이라는 방법을 이용하였다. 이렇게 설정한 사전 중요도를 이용하여 GA와 PSO의 진화 연산을 수정하였다. 데이터를 이용한 실험을 통하여 제안한 방법들의 성능을 검증하였다. GA와 PSO를 이용한 특징 선택 방법은 그 분류 정확도에 있어서 뛰어난 성능을 보여주었다. 그리고 최종적으로 제시한 사전 중요도를 이용해 개선된 방법은 그 진화 속도와 분류 정확도 면에서 기존의 GA와 PSO 방법보다 더 나아진 성능을 보여주는 것을 확인하였다.

References

  1. Blum, A. and Langley, P., “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence, Vol.97, No.1-2, pp.245-271, 1997 https://doi.org/10.1016/S0004-3702(97)00063-5
  2. Liu, H. and Yu, L., “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.4, pp.491-502, 2005 https://doi.org/10.1109/TKDE.2005.66
  3. Jain, A. and Zongker, D., “Feature Selection : Evaluation, Application, and Small Sample Performance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.19, No.2, pp.153-158, 1997 https://doi.org/10.1109/34.574797
  4. Zhao, J., Wang, G., Wu, Z., Tang, H. and Li, H., “The Study on Technologies for Feature Selection,” Proceedings of the First International Conference on Machine Learning and Cybernetics, pp.689-693, 2002
  5. Mitchell, M., An Introduction to Genetic Algorithms, MIT PRESS, 1996
  6. Bautista, M. and Vila, M., “A Survey of Genetic Feature Selection in Mining Issues,” Proceedings of the Congress on Evolutionary Computation, Vol.2, pp.1314-1321, 1999 https://doi.org/10.1109/CEC.1999.782599
  7. Yang, J. and Honavar, V., “Feature Subset Selection Using a Genetic Algorithm,” IEEE Intelligent Systems, Vol.13, pp.44-49, 1998 https://doi.org/10.1109/5254.671091
  8. Liu, Y., Qin, Z., Xu, Z. and He, X., “Feature Selection with Particle Swarms,” Proceedings of International Symposium on Computational and Information Science, pp.425-430, 2004
  9. Firpi, A. and Goodman, E., “Swarmed Feature Selection,” Proceedings of the 33rd Applied Imagery Pattern Recognition Workshop, pp.112-118, 2004
  10. Yan, Z. and Yuan, C., “Ant Colony Optimization for Feature Selection in Face Recognition,” Lecture Note in Computer Science, Vol.3072, SPRINGER, 2004
  11. Galbally, J., Fierrez, J., Freire, M.R. and Ortega-Garcia, J., “Feature Selection Based on Genetic Algorithms for On-Line Signature Verification,” Proceedings of IEEE Workshop on Automatic Identification Advanced Technologies, pp.198-203, 2007 https://doi.org/10.1109/AUTOID.2007.380619
  12. Bello, R., Gomez, Y., Nowe, A. and Garcia, Maria M. “Two-Step Particle Swarm Optimization to Solve the Feature Selection Problem,” Proceedings of Seventh International Conference on Intelligent Systems Design and Applications, pp.691-696, 2007
  13. Geetha, K., Thanushkodi, K. and Kumar, A.K. “New Particle Swarm Optimization for Feature Selection and Classification of Microcalcifications in Mammograms,” Proceedings of International Conference on Signal Processing, Communications and Networking, pp.458-463, 2008
  14. Muni, D.P., Pal, N.R. and Das, J. “Genetic programming for simultaneous feature selection and classifier design,” IEEE Transactions on Systems, Man, and Cybernetics, Vol.36, pp. 106-117, 2006 https://doi.org/10.1109/TSMCB.2005.854499
  15. Goldberg, D., Genetic Algorithms in Search, Optimization, and Machine Learning, ADDISON-WESLEY, 1989
  16. Kennedy, J. and Eberhart, R., Swarm Intelligence, MORGAN KAUFMANN, 2001
  17. Kennedy, J. and Eberhart, R., “Particle Swarm Optimization,” Proceedings of the Conference on Neural Networks, pp.1942-1948, 1995
  18. Engelbrecht, A., Fundamentals of Computational Swarm Intelligence. WILEY, 2005
  19. Peng, H., Long, F. and Ding, C., “Feature Selection based on Mutual Information : Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.27, No.8, pp.1226-1238, 2005 https://doi.org/10.1109/TPAMI.2005.159
  20. Blake, C. and Merz, C., UCI Repository of Machine Learning Database, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998
  21. Mitchell, T., Machine Learning, MCGRAW-HILL, 1997
  22. Platt, C., “Fast Training of Support Vector Machines using Sequential Minimal Optimization,” in Advances in Kernel Methods: Support Vector Learning, pp.185-208, 1999
  23. Quinlan, J., C4.5: Programs for Machine Learning, MORGAN KAUFMANN, 1993
  24. WEKA3.5, http://www.cs.waikato.ac.nz/~ml/