사상체질 임상자료 기반 의사결정나무 생성 알고리즘 비교

Comparison among Algorithms for Decision Tree based on Sasang Constitutional Clinical Data

  • 진희정 (한국한의학연구원) ;
  • 이수경 (원광대학교 한의과대학 한방재활의학과) ;
  • 이시우 (한국한의학연구원)
  • Jin, Hee-Jeong (Korea Institute of Oriental Medicine) ;
  • Lee, Su-Kyung (Dept. of Oriental Rehabilitation, Wonkwang University School of Oriental Medicine) ;
  • Lee, Si-Woo (Korea Institute of Oriental Medicine)
  • 투고 : 2011.06.30
  • 심사 : 2011.08.03
  • 발행 : 2011.08.30

초록

Objectives : In the clinical field, it is important to understand the factors that have effects on a certain disease or symptom. For this, many researchers apply Data Mining method to the clinical data that they have collected. One of the efficient methods for Data Mining is decision tree induction. Many researchers have studied to find the best split criteria of decision tree; however, various split criteria coexist. Methods : In this paper, we applied several split criteria(Information Gain, Gini Index, Chi-Square) to Sasang constitutional clinical information and compared each decision tree in order to find optimal split criteria. Results & Conclusion : We found BMI and body measurement factors are important factors to Sasang constitution by analyzing produced decision trees with different split measures. And the decision tree using information gain had the highest accuracy. However, the decision tree that produced highest accuracy is changed depending on given data. So, researcher have to try to find proper split criteria for given data by understanding attribute of the given data.

키워드

참고문헌

  1. Michael J. Berry, Gordon Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons,Inc, New York, USA:1997
  2. Michael J. Berry, Gordon Linoff, Data Mining Techniques and Algorithms, John Wiley & Sons,Inc, New York, USA:2000
  3. Gartner, www.gartner.com/
  4. Pieter Adriaans, Dolf Zantinge, Data Mining, Addison-Wesley:1996
  5. 진희정, 김명근, 김종열, 사상체질 임상정보 분석을 위한 웹 기반의 의사결정 나무 프로그램 개발. 한국한의학연구원논문집, 2008:14(3):81-87.
  6. 신상훈, 김종열, 맥파를 이용한 사상체질의 진단에 있어서 분류방법에 따른 진단의 정확도 비교, 한국콘텐츠학회논문지, 2009:9(10):1-499
  7. 박은경, 이영섭, 박성식, 의사결정나무법을 이용한 체질진단에 관한 연구, 사상체질의학회, 2001:13(2):144-155
  8. 박성식, 최재영, 의사결정나무법을 이용한 설문지의 응답특성에 대한 임상적 검토, 사상체질의학회, 2003:15(3):177=186
  9. SAS, http://www.sas.com
  10. J. R. Quinlan, Induction of Decision Trees. Mach. Learn. 1986
  11. Leo Breiman, Classification and Regression Trees, CHAPMAN& HALL,1984
  12. Leo Breiman, Random Forests, Machine Learning, 2001:45(1):5-32. https://doi.org/10.1023/A:1010933404324
  13. J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers,1993
  14. J.A. Hartigan, Clustering Algorithms, New York-Wiley, 1975
  15. WEKA, http://www.cs.waikato.ac.nz/ml/weka/