DOI QR코드

DOI QR Code

An Improvement of FSDD for Evaluating Multi-Dimensional Data

다차원 데이터 평가가 가능한 개선된 FSDD 연구

  • Oh, Se-jong (Dept. of Software Science, Dankook University)
  • 오세종 (단국대학교 공과대학 소프트웨어학과)
  • Received : 2016.11.22
  • Accepted : 2017.01.20
  • Published : 2017.01.28

Abstract

Feature selection or variable selection is a data mining scheme for selecting highly relevant features with target concept from high dimensional data. It decreases dimensionality of data, and makes it easy to analyze clusters or classification. A feature selection scheme requires an evaluation function. Most of current evaluation functions are based on statistics or information theory, and they can evaluate only for single feature (one-dimensional data). However, features have interactions between them, and require evaluation function for multi-dimensional data for efficient feature selection. In this study, we propose modification of FSDD evaluation function for utilizing evaluation of multiple features using extended distance function. Original FSDD is just possible for single feature evaluation. Proposed approach may be expected to be applied on other single feature evaluation method.

Keywords

Feature selection;Feature evaluation;Multi-dimensional data;FSDD;Data mining

Acknowledgement

Grant : Development of a model for optimal growth management of crops in protected horticulture

References

  1. IK. Fodor, "A survey of dimension reduction techniques." 2002.
  2. N. Kambhatla and K.L. Todd "Dimension reduction by local principal component analysis." Neural Computation Vol. 9 No. 7, pp. 1493-1516, 1997. https://doi.org/10.1162/neco.1997.9.7.1493
  3. P. Benner, M. Volker, and C.S. Danny, Dimension reduction of large-scale systems. Vol. 45. Springer-Verlag Berlin Heidelberg, 2005.
  4. A.N. Gorban, et al., eds. Principal manifolds for data visualization and dimension reduction. Vol. 58. Berlin-Heidelberg: Springer, 2008.
  5. S. Rahman and X. Heqin, "A univariate dimension-reduction method for multi-dimensional integration in stochastic mechanics." Probabilistic Engineering Mechanics Vol. 19 No. 4, pp. 393-408, 2004. https://doi.org/10.1016/j.probengmech.2004.04.003
  6. M. Robnik-Sikonja, I. Kononenko, "Theoretical and empirical analysis of ReliefF and RReliefF", Machine learning, Vol. 53 No. 1, pp.23-69, 2003. https://doi.org/10.1023/A:1025667309714
  7. H. Liu, R. Setiono, "Chi2: Feature selection and discretization of numeric attributes", 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, p.388, 1995
  8. J. Liang, S. Yang, A. Winstanley, "Invariant optimal feature selection: A distance discriminant and feature ranking based solution", Pattern Recognition, Vol. 41 No. 5, pp.429-1439, 2008.
  9. H. Peng, F. Long, C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27 No. 8, pp. 1226-1238, 2005. https://doi.org/10.1109/TPAMI.2005.159
  10. M.A. Hall, "Correlation-based feature selection for machine learning", Diss. The University of Waikato, 1999.
  11. Y. Saeys, I. Inza, P. Larranaga, "A review of feature selection techniques in bioinformatics", bioinformatics, Vol. 23 No. 19, pp. 2507-2517, 2007. https://doi.org/10.1093/bioinformatics/btm344
  12. P. Horton, K. Nakai. "A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins", Intelligent Systems in Molecular Biology, pp.109-115, 1996.
  13. WN. Venables, BD. Ripley, Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0, 2002.
  14. D. Meyer, E. Dimitriadou , K. Hornik, A. Weingessel, F. Leisch, C. Chang, C. Lin, E1071 package, CARN, 2015
  15. K. Ron. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Ijcai. Vol. 14 No 2. pp. 1137-11145, 1995.
  16. Yoon-Su Jeong, "Business Process Model for Efficient SMB using Big Data", Journal of IT Convergence Society for SMB, Vol. 5 No.4, pp. 11-16, 2015.
  17. Young-Bok Cho, Seng-hee W, Sang-Ho Lee, "In Small and Medium Business the Government 3.0-based Big Data Utilization Policy", Journal of IT Convergence Society for SMB, Vol. 3 No. 1, pp. 15-22, 2013.
  18. Young-Jun Kim, "Convergence of Business Information System Process using Knowledge-based Method", Journal of the Korea Convergence Society, Vol. 6 No. 4, pp.65-71, 2015.
  19. Yong-won Kim, "A study on Convergent & Adaptive Quality Analysis using DQnA model", Journal of the Korea Convergence Society, Vol. 5 No. 4, pp.21-25, 2014. https://doi.org/10.15207/JKCS.2014.5.4.021
  20. Yoon-Su Jeong, Yong-Tae Kim, Gil-Cheol Park, "Multi-Attribute based on Data Management Scheme in Big Data Environment", Journal of Digital Convergence, Vol. 13 No. 1, pp. 263-268, 2015
  21. Jun-Seok Lee, "A Study on the Data Mining Preprocessing Tool For Efficient Database Marketing", Journal of Digital Convergence, Vol. 12 No. 11, pp. 257-264, 2014.