Feature Subset Selection in the Induction Algorithm using Sensitivity Analysis of Neural Networks

신경망의 민감도 분석을 이용한 귀납적 학습기법의 변수 부분집합 선정

  • 강부식 (목원대학교 경영정보학과) ;
  • 박상찬 (한국과학기술원 산업공학과)
  • Published : 2001.12.01

Abstract

In supervised machine learning, an induction algorithm, which is able to extract rules from data with learning capability, provides a useful tool for data mining. Practical induction algorithms are known to degrade in prediction accuracy and generate complex rules unnecessarily when trained on data containing superfluous features. Thus it needs feature subset selection for better performance of them. In feature subset selection on the induction algorithm, wrapper method is repeatedly run it on the dataset using various feature subsets. But it is impractical to search the whole space exhaustively unless the features are small. This study proposes a heuristic method that uses sensitivity analysis of neural networks to the wrapper method for generating rules with higher possible accuracy. First it gives priority to all features using sensitivity analysis of neural networks. And it uses the wrapper method that searches the ordered feature space. In experiments to three datasets, we show that the suggested method is capable of selecting a feature subset that improves the performance of the induction algorithm within certain iteration.

데이터로부터 학습하여 룰을 추출하는 귀납적 학습기법은 데이터 마이닝의 주요 도구 중 하나이다. 귀납적 학습 기법은 불필요한 변수나 잡음이 섞인 변수를 포함하여 학습하는 경우 생성된 룰의 예측 성능이 떨어지고 불필요하게 룰이 복잡하게 구성될 수 있다. 따라서 귀납적 학습 기법의 예측력을 높이고 룰의 구성도 간단하게 할 수 있는 주요 변수 부분집합을 선정하는 방안이 필요하다. 귀납적 학습에서 예측력을 높이기 위해 많이 사용되는 부분집합 선정을 위한 포장 기법은 최적의 부분집합을 찾기 위해 전체 부분집합을 탐색한다. 이때 전체 변수의 수가 많아지면 부분집합의 탐색 공간이 너무 커져서 탐색하기 어려운 문제가 된다. 본 연구에서는 포장 기법에 신경망 민감도 분석을 결합한 귀납적 학습 기법의 변수 부분집합 선정 방안을 제시한다. 먼저, 신경망의 민감도 분석 기법을 이용하여 전체 변수를 중요도 순으로 순서화 한다. 다음에 순서화된 정보를 이용하여 귀납적 학습 기법의 예측력을 높일 수 있는 부분집합을 찾아 나간다. 제안된 방법을 세 데이터 셋에 적용한 결과 일정한 반복 회수 이내에 예측력이 향상된 부분집합을 얻을 수 있음을 볼 수 있다.

Keywords

References

  1. Learning Boolean Concepts in the Presence of Many Irrelevant Features,Artificial Intelligence v.69 Almuallim,H.;T.Dietterich,
  2. Artificial Intelligence v.97 Selection of Relevant Features and Examples in Machine Learning Blum,A.L;P.Langley
  3. Classification and Regression Trees,Wadsworth International Group,Belmont California Breiman,L.,J.H.Friedman,R.A.Olshen,;C.J.Stone,
  4. Intelligent Data Analysis v.1 Feature Selection Methods for Classifications Dash,M.;H.Liu,
  5. IEEE Transactions on Neural Networks v.12 no.6 A New Pruning Heuristic Based on Variance Analysis of Sensitivity Information Englebrecht,A.P
  6. Essentials of Artificial Intelligence,Morgan Kaufmann Ginsberg,M.L.
  7. Neural Computation v.7 Regularization Theory and Neural Network Architectures Girosi,F.,M.Jones,;T.Poggio
  8. Neural Networks v.4 Backpropagation Algorithm which Varies the Number of Hidden Units Hirose,Y.,K.Yamashita,;S.Hijiya,
  9. IEEE Transactions on Neural Networks v.1 no.2 A Simple Procedure for Pruning Back-Propagation Trained Neural Networks Karnin,E.D.,
  10. In Feature Extraction Construction,and Selection A Data Mining Perspective,H.Liu and H,Motoda(Eds),Klumer Academic Publishers,Dordrecht,Netherlands The Wrapper Approach Kohavi,R.;G.H.John
  11. in Advances in Neural Information Processing Optimal Brain Damage Le Cun,Y.,J.S.Denker;S.A.Solla,
  12. In Feature Extraction Construction ,and Selection:A Data Mining Perspective,H.Liu and H.Motoda(Eds)Klumer Academic Publishers Dordrecht Netherlands Less is More Liu,H.;H.Motoda,
  13. UCI Repository of Machine Learning Databases,University of California,Irvine,CA Merz,C.J.;P.M.Murphy,
  14. in Adxnces in Neural Information Processing Skeletonization A Technique for Trimming the Fat from a Network via Relevance Assessment Mozer,M.C.;P.Smolensky
  15. IEEE Transactions on Neural Networks v.10 no.4 A Formal Selection and Pruning Algorithm for Feedforward Artificial Neural Network Optimization Ponnapalli,P.V.S.,K.C.Ho,;M.Thomson,
  16. Programs for Machine Learning Morgan Kaufmann,San Mateo Quinlan,J.R.CA5
  17. IEEE Transactions on Neural Networks v.4 Pruning Algorithms-A Survey Reed,R
  18. In Parallel distributed Processing v.1 Learning Internal Representations by Error Propagation Rumelhart,D.E.,G.E.Hinton,;R.J.Williams,
  19. In Feature Extraction Construction and Selection:A Data Mining Perspective,H.Liu and H.Motoda,(Eds) Klumer Academic Publishers,Dordrecht Netherlands Feature Extraction via Neural Networks Setion,R.;H.Liu
  20. In Feature Extraction,Construction,and Seletion : A Data Mining Perspective,H.Liu and H.Motoda,(Eds), Relevance Approach to Feature Subset Selection Wang,H.D.Bell,;F.Murtagh
  21. In Feature Extraction,Construction, and Selection : A Data Mining Perspective,H.Liu and H.Motoad,(Eds) Selecting Features by Vertical Compactness of Data Wang,K.;S.Sundaresh,
  22. Computer Systems that Learn Morgan Kaufmann Publishers, Weiss,S.M.,
  23. Machine Learning v.14 Hypothesis-Driven Constructive Induction in AQ17-HCI : a Method and Experiments Wnek,J.;R.S.Michalski
  24. In Pattern Recognition in Practice,E.S.Gelsema and L.N.Kanal.,(Eds),Morgan Kaufmann Publishers a Critical Evaluation of Intrinsic Dimensionality Algorithms Wyes,N.,R.Dubes,;A,K.Jain
  25. IN Feature Extraction,Construction and Selection : A Data Mining Perspective,H.Liu and H.Motoda,(Eds) Feature Subset Seletion Using a Genetic Algorithm Yang,J.;V.Hongavar,
  26. a Sequential Learning Approach for Single Hidden Layer Neural Networks,Neural Networks v.11 Zhang J.;A.Morris