DOI QR코드

DOI QR Code

An Improvement of FSDD for Evaluating Multi-Dimensional Data

다차원 데이터 평가가 가능한 개선된 FSDD 연구

  • Oh, Se-jong (Dept. of Software Science, Dankook University)
  • 오세종 (단국대학교 공과대학 소프트웨어학과)
  • Received : 2016.11.22
  • Accepted : 2017.01.20
  • Published : 2017.01.28

Abstract

Feature selection or variable selection is a data mining scheme for selecting highly relevant features with target concept from high dimensional data. It decreases dimensionality of data, and makes it easy to analyze clusters or classification. A feature selection scheme requires an evaluation function. Most of current evaluation functions are based on statistics or information theory, and they can evaluate only for single feature (one-dimensional data). However, features have interactions between them, and require evaluation function for multi-dimensional data for efficient feature selection. In this study, we propose modification of FSDD evaluation function for utilizing evaluation of multiple features using extended distance function. Original FSDD is just possible for single feature evaluation. Proposed approach may be expected to be applied on other single feature evaluation method.

피처선택, 혹은 변수 선택은 피처의 수가 매우 많은 고차원 데이터에서 주어진 주제와 연관성이 높은 피처를 선별하는 과정으로서, 데이터의 차원수를 낮추어 군집분석이나 분류 분석 등을 용이하게 하는데 중요한 기법이다. 많은 수의 피처들 중에서 일부의 피처를 선별하기 위해서는 피처들을 평가하기 위한 도구가 필요하다. 현재까지 제안된 도구들은 대부분 확률이론이나 정보이론에 기초하여 만들어졌기 때문에 하나의 피처, 즉 1차원 데이터만을 평가할 수 있다. 그러나 피처들 간에는 상호작용이 있기 때문에 하나의 피처를 평가하기 보다는 여러 피처들의 집합, 즉 다차원 데이터를 평가할 수 있어야 효과적인 피처 선택이 가능하다. 본 연구에서는 확장된 거리 함수를 이용하여 1차원 데이터 평가용으로 제안된 FSDD 평가 함수를 다차원 데이터에 대한 평가가 가능하도록 개선하는 방법에 대해 제안하였다. 본 연구에서 제안한 접근법은 다른 1차원 데이터 평가함수에도 적용이 될 수 있을 것으로 기대된다.

Keywords

References

  1. IK. Fodor, "A survey of dimension reduction techniques." 2002.
  2. N. Kambhatla and K.L. Todd "Dimension reduction by local principal component analysis." Neural Computation Vol. 9 No. 7, pp. 1493-1516, 1997. https://doi.org/10.1162/neco.1997.9.7.1493
  3. P. Benner, M. Volker, and C.S. Danny, Dimension reduction of large-scale systems. Vol. 45. Springer-Verlag Berlin Heidelberg, 2005.
  4. A.N. Gorban, et al., eds. Principal manifolds for data visualization and dimension reduction. Vol. 58. Berlin-Heidelberg: Springer, 2008.
  5. S. Rahman and X. Heqin, "A univariate dimension-reduction method for multi-dimensional integration in stochastic mechanics." Probabilistic Engineering Mechanics Vol. 19 No. 4, pp. 393-408, 2004. https://doi.org/10.1016/j.probengmech.2004.04.003
  6. M. Robnik-Sikonja, I. Kononenko, "Theoretical and empirical analysis of ReliefF and RReliefF", Machine learning, Vol. 53 No. 1, pp.23-69, 2003. https://doi.org/10.1023/A:1025667309714
  7. H. Liu, R. Setiono, "Chi2: Feature selection and discretization of numeric attributes", 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, p.388, 1995
  8. J. Liang, S. Yang, A. Winstanley, "Invariant optimal feature selection: A distance discriminant and feature ranking based solution", Pattern Recognition, Vol. 41 No. 5, pp.429-1439, 2008.
  9. H. Peng, F. Long, C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27 No. 8, pp. 1226-1238, 2005. https://doi.org/10.1109/TPAMI.2005.159
  10. M.A. Hall, "Correlation-based feature selection for machine learning", Diss. The University of Waikato, 1999.
  11. Y. Saeys, I. Inza, P. Larranaga, "A review of feature selection techniques in bioinformatics", bioinformatics, Vol. 23 No. 19, pp. 2507-2517, 2007. https://doi.org/10.1093/bioinformatics/btm344
  12. P. Horton, K. Nakai. "A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins", Intelligent Systems in Molecular Biology, pp.109-115, 1996.
  13. WN. Venables, BD. Ripley, Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0, 2002.
  14. D. Meyer, E. Dimitriadou , K. Hornik, A. Weingessel, F. Leisch, C. Chang, C. Lin, E1071 package, CARN, 2015
  15. K. Ron. "A study of cross-validation and bootstrap for accuracy estimation and model selection." Ijcai. Vol. 14 No 2. pp. 1137-11145, 1995.
  16. Yoon-Su Jeong, "Business Process Model for Efficient SMB using Big Data", Journal of IT Convergence Society for SMB, Vol. 5 No.4, pp. 11-16, 2015.
  17. Young-Bok Cho, Seng-hee W, Sang-Ho Lee, "In Small and Medium Business the Government 3.0-based Big Data Utilization Policy", Journal of IT Convergence Society for SMB, Vol. 3 No. 1, pp. 15-22, 2013.
  18. Young-Jun Kim, "Convergence of Business Information System Process using Knowledge-based Method", Journal of the Korea Convergence Society, Vol. 6 No. 4, pp.65-71, 2015. https://doi.org/10.15207/JKCS.2015.6.4.065
  19. Yong-won Kim, "A study on Convergent & Adaptive Quality Analysis using DQnA model", Journal of the Korea Convergence Society, Vol. 5 No. 4, pp.21-25, 2014. https://doi.org/10.15207/JKCS.2014.5.4.021
  20. Yoon-Su Jeong, Yong-Tae Kim, Gil-Cheol Park, "Multi-Attribute based on Data Management Scheme in Big Data Environment", Journal of Digital Convergence, Vol. 13 No. 1, pp. 263-268, 2015 https://doi.org/10.14400/JDC.2015.13.1.263
  21. Jun-Seok Lee, "A Study on the Data Mining Preprocessing Tool For Efficient Database Marketing", Journal of Digital Convergence, Vol. 12 No. 11, pp. 257-264, 2014. https://doi.org/10.14400/JDC.2014.12.11.257