DOI QR코드

DOI QR Code

Application of text-mining technique and machine-learning model with clinical text data obtained from case reports for Sasang constitution diagnosis: a feasibility study

자연어 처리에 기반한 사상체질 치험례의 텍스트 마이닝 분석과 체질 진단을 위한 머신러닝 모델 선정

  • Jinseok Kim (Department of Korean Medicine, College of Korean Medicine, Sangji University ) ;
  • So-hyun Park (Department of Korean Medicine, College of Korean Medicine, Sangji University ) ;
  • Roa Jeong (Department of Korean Medicine, College of Korean Medicine, Sangji University ) ;
  • Eunsu Lee (Department of Korean Medicine, College of Korean Medicine, Sangji University ) ;
  • Yunseo Kim (Department of Korean Medicine, College of Korean Medicine, Sangji University ) ;
  • Hyundong Sung (Sogang Univ. Computer Science & Engineering ) ;
  • Jun-sang Yu (Department of Sasang Constitutional Medicine, College of Korean Medicine, Sangji University )
  • 김진석 (상지대학교 한의과대학 한의학과) ;
  • 박소현 (상지대학교 한의과대학 한의학과) ;
  • 정로아 (상지대학교 한의과대학 한의학과) ;
  • 이은수 (상지대학교 한의과대학 한의학과) ;
  • 김윤서 (상지대학교 한의과대학 한의학과) ;
  • 성현동 (서강대학교 공과대학 컴퓨터공학과) ;
  • 유준상 (상지대학교 한의과대학 사상체질의학교실)
  • Received : 2024.08.02
  • Accepted : 2024.08.28
  • Published : 2024.09.01

Abstract

Objectives: We analyzed Sasang constitution case reports using text mining to derive network analysis results and designed a classification algorithm using machine learning to select a model suitable for classifying Sasang constitution based on text data. Methods: Case reports on Sasang constitution published from January 1, 2000, to December 31, 2022, were searched. As a result, 343 papers were selected, yielding 454 cases. Extracted texts were pretreated and tokenized with the Python-based KoNLPy package. Each morpheme was vectorized using TF-IDF values. Word cloud visualization and centrality analysis identified keywords mainly used for classifying Sasang constitution in clinical practice. To select the most suitable classification model for diagnosing Sasang constitution, the performance of five models-XGBoost, LightGBM, SVC, Logistic Regression, and Random Forest Classifier-was evaluated using accuracy and F1-Score. Results: Through word cloud visualization and centrality analysis, specific keywords for each constitution were identified. Logistic regression showed the highest accuracy (0.839416), while random forest classifier showed the lowest (0.773723). Based on F1-Score, XGBoost scored the highest (0.739811), and random forest classifier scored the lowest (0.643421). Conclusions: This is the first study to analyze constitution classification by applying text mining and machine learning to case reports, providing a concrete research model for follow-up research. The keywords selected through text mining were confirmed to effectively reflect the characteristics of each Sasang constitution type. Based on text data from case reports, the most suitable machine learning models for diagnosing Sasang constitution are logistic regression and XGBoost.

Keywords

References

  1. Jung, S. H. (2021). A Study on Using Topic Modeling and Network Analysis. The Korean Language and Literature, (197), 111-144. https://doi.org/10.31889/kll.2021.12.197.111 
  2. Cho, S. Z. & Kang, S. H. (2016). Industrial Applications of Machine Learning (Artificial Intelligence), Industrial Engineering Magazine, 23(2), 34-38. 
  3. Seo, H. J. (2019). A Preliminary Discussion on Policy Decision Making of AI in The Fourth Industrial Revolution. Informatization Policy, 26(3), 1-1. https://doi.org/10.22693/NIAIP.2019.26.3.003 
  4. Baek, S. W. (2023). Natural Language Processing in Construction Management. KSCE 2023 CONVENTION, 549-550. 
  5. Park, K. M. & Hwang, K. B. (2011). A Bio-Text Mining System Based on Natural Language Processing. Journal of KIISE : Computing Practices and Letters, 17(4), 205-213. 
  6. Choi, C. H., Park, K. H., Park, H. K., Lee, M. J., Kim. J. S. & Kim. H. S. (2017). Development of Heavy Rain Damage Prediction Function for Public Facility Using Machine Learning. Journal of Korean Society of Hazard Mitigation, 17(6), 443-450. https://doi.org/10.9798/KOSHAM.2017.17.6.443 
  7. Hong, J. W., Kim, Y. I., Park, S. J., Kim, B. C., Eom, I. K., Hwang, M. W. et al. (2009). Data mining Algorithms for the Development of Sasang Type Diagnosis. Journal of Physiology & Pathology in Korean Medicine, 23(6), 1234-1240. 
  8. Lee, J. H. & Lee, H. H. (2019). Selecting Sasang-Type classification model using machine learning and designing the service flow. Journal of Digital Contents Society, 20(2), 321-327. http://dx.doi.org/10.9728/dcs.2019.20.2.321 
  9. Lee, H. R. & Lee, J. H. (2021). A Study on the Development of Diagnostic Tools for Sasang Constitutional Patterns. Journal of Sasang Constitutional Medicine, 33(3), 95-126. https://doi.org/10.7730/JSCM.2021.33.3.95 
  10. Kim, G. W. (2002). Relation of Sasang Constitution diseases and Mind-Body Medicine (Sasang Constitutinal Medicine from the psychiatry point of view). Journal of Oriental Neuropsychiatry, 13(2), 11-19. 
  11. Craddock, N. & Mynors-Wallis, L. (2014). Psychiatric diagnosis: impersonal, imperfect and important. Br J Psychiatry, 204(2), 93-95. https://doi.org/10.1192/bjp.bp.113.133090 
  12. Srivastava A, & Sahami M. (2009). Text mining : Classification, Clustering, and Applications. CRC Press. 
  13. Park, S. E. & Gang, J. Y. Python Text Mining Complete Guide. 1st Edition. Gyeonggi : Wikibooks. 2022:322 
  14. Seo, D. H. Grab It! Text Mining with Python. 1st Edition. Seoul: bjpublic. 2019:203 
  15. Park, D. H. & Cho, M, H. (2022). Identifying Fine Dining Restaurant Consumers' Perceptions: A Pre- and During COVID-19 Comparison using Big Data. Korean Journal of Hospitality & Tourism, 31(4), 17-32. https://doi.org/10.24992/KJHT.2022.6.31.04.17 
  16. Seo, D. H. (2019) Grab It! Text Mining with Python. 1st Edition. Seoul:bjpublic. 203 
  17. Racz A, Bajusz D, Heberger K. (2021). Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules, 26(4), 1111. 
  18. Department of Sasang Constitutional Medicine, College of Korean Medicine. (2004). Sasang constitutional medicine. Jipmoon. 164-165, 643, 729-730. 
  19. Park, H. S., Joo, J. C., Kim. J. H. & Kim. K. Y. (2002). A Study on clinical application of the QSCCII(Questionnaire for the Sasang Constitution ClassificationII). Journal of Sasang Constitutional Medicine, 14(2), 35-44. 
  20. Baek, Y. H., Kim, H. S., Lee, S. W. & Jang, E. S. (2014). The Concordance and Validity Assessment of Diagnosis for the Expert in Sasang Constitution. Journal of Sasang constitutional medicine, 26(3), 295-303.  https://doi.org/10.7730/JSCM.2014.26.3.295
  21. Lee, S. G., Kwak, C. K., Lee, E. J., Ko, B. H. & Song, I. B. (2003). The Study of the Upgrade of QSCCII(II)-A Study on the re-validity of QSCCII-. Journal of Sasang constitutional medicine, 15(1), 39-49. 
  22. Kang, M. S., Oh, J. W., Lee, H. R. & Lee, J. H. (2019). Patient Group Study to Improve the Accuracy of QSCC II+. Journal of Sasang Constitutional Medicine, 31(3), 48-65. https://doi.org/10.7730/JSCM.2019.31.3.48 
  23. Do, J. H., Nam, J. H., Jang, E. S., Jang. J. S., Kim, J. W., Kim, Y. S. et al. (2013). Comparison between Diagnostic Results of the Sasang Constitutional Analysis Tool (SCAT) and a Sasang Constitution Expert. Journal of Sasang constitutional medicine, 25(3), 158-166. https://doi.org/10.7730/JSCM.2013.25.3.158 
  24. Hwang, D. S., Cho, J. H., Lee, C. H., Jang, J. B. & Lee, K. S. (2006). A Study on Reproducibility of Responses to the Questionnaire for Sasang Constitution Classification II (QSCCII). Journal of Korean Medicine, 27(3), 145-150. 
  25. Kim, J. W., Sul, Y. K., Choi, J. J., Kwon, S. D., Kim, K. K. & Lee, Y. T. (2007). Comparative Study of Diagnostic Accuracy Rate by Sasang Constitutions on Measurement Method of Body Shape. Journal of physiology & pathology in Korean Medicine, 21(1), 338-346. 
  26. Lee, E. J., Song, K. B., Choi, H. S., Yoo, J. H., Kwak, C. K., Sohn, E. H. et al. (2005). Pilot Study on the classification for sasangin by the voice analysis. Journal of Korean Oriental Medicine, 26(1), 93-102. 
  27. Lee, J.H. (2022). Korean Medicine Clinical Practice Guideline for Sasang(Four) constitutional medicine patterns. Korea:The Society of Sasang Constitutional Medicine. 
  28. Kim, M. J & Lee, S. J. (2018). Study of health characteristics of female college students according to sasang constitution and factors affecting BMI. Journal of Sasang constitutional medicine, 30(3), 48-61.  https://doi.org/10.7730/JSCM.2018.30.3.48
  29. Kim, E. Y. & Kim, J. W. (2004). A Clinical study on the Sasang Constitution and Obesity. Journal of Sasang constitutional medicine, 16(1), 100-111. 
  30. Hong, S. C., Lee, S. K., Lee, E. J., Han, G. H., Chou, Y. J., Choi, C. H. et al. (1998). A Study on the morphologic characteristics of each constitution's trunk. Journal of Sasang constitutional medicine, 10(1), 101-142. 
  31. Choi, J. S. & Kim, K. Y. (1998). A Study on Disease and Medical Theory of Soyangin Bisoohan-pyohanbyung-theory. Journal of Sasang constitutional medicine, 10(2), 61-110. 
  32. Park, S. E. (2021). Analysis of the Status of Natural Language Processing Technology Based on Deep Learning. The Korea Journal of BigData, 6(1), 63-81. https://doi.org/10.36498/kbigdt.2021.6.1.63