DOI QR코드

DOI QR Code

A Feature Selection Technique based on Distributional Differences

  • Published : 2006.03.01

Abstract

This paper presents a feature selection technique based on distributional differences for efficient machine learning. Initial training data consists of data including many features and a target value. We classified them into positive and negative data based on the target value. We then divided the range of the feature values into 10 intervals and calculated the distribution of the intervals in each positive and negative data. Then, we selected the features and the intervals of the features for which the distributional differences are over a certain threshold. Using the selected intervals and features, we could obtain the reduced training data. In the experiments, we will show that the reduced training data can reduce the training time of the neural network by about 40%, and we can obtain more profit on simulated stock trading using the trained functions as well.

Keywords

References

  1. H. Liu and H. Motoda, 'Feature Selection for Knowledge Discovery and Data Mining', Kluwer Academic Publishers, 1998
  2. Daphne Koller and Mehran Sahami, 'Toward Optimal Feature Selection', In Proceedings of the 13th ICML, pp. 284-292, 1996
  3. Sung-Dong Kim, Jae Won Lee, Jongwoo Lee, and Jinseok Chae, 'A Two-Phase Stock Trading System Using Distributional Differences', In Proceedings of the 13th DEXA, LNCS 2453, pp. 143-152, 2002
  4. Sung-Dong Kim, Jae Won Lee, 'Induction of Stock Trading Rules Using Distributional Differences', In Proceedings of Korea Data Mining Conference, pp. 206-216, 2001
  5. N. Wyse, R. Dubes, and A.K. Jain, 'A critical evaluation of intrinsic dimensionality algorithms', In E.S. Gelsema and L.N. Kanal, editors, Pattern Recognition in Practice, Morgan Kaufmann Publishers, Inc., pp. 415-425, 1980
  6. A.L. Blum and P. Langley, 'Selection of relevant features and examples in machine learning', Artificial Intelligence, pp.245-271, 1997
  7. J.G. Dy and C.E. Brodley, 'Feature subset selection and order identification for unsupervised learning', In Proceedings of the 17th International Conference on Machine Learning, pp. 247-254, 2000
  8. Yiming Yang and Jan O. Pedersen, 'A Comparative Study on Feature Selection in Text Categorization', In Proceedings of the 14th International Conference on Machine Learning, pp. 412-420, 1997
  9. G.H. John, R. Kohavi, and K. Pfleger, 'Irrelevant feature and the subset selection problem', In Proceedings of the 11th International Conference on Machine Learning, pp. 121-129, 1994
  10. M. Dash and H. Lie, 'Feature selection methods for classification', Intelligent Data Analysis, Vol. 1, No. 3, pp. 131-156, 1997 https://doi.org/10.1016/S1088-467X(97)00008-5
  11. L. Talavera, 'Feature selection as a preprocessing step for hierarchical clustering', In Proceedings of International Conference on Machine Learning, pp. 389-397, 1999
  12. K. Kira and L. Rendell, 'A practical approach to feature selection', In Proceedings of the 9th ICML, pp. 249-256, 1992
  13. J.G. Dy and C.E. Brodley, 'Feature subset selection and order identification for unsupervised learning', In Proceedings of the 17th International Conference on Machine Learning, pp. 247-254, 2000
  14. H. Almuallim and T.G. Dietterich, 'Learning with many irrelevant features', In Proceedings of the 9th National Conference on Artificial Intelligence, pp. 547-552, 1991