DOI QR코드

DOI QR Code

Donguibogam-Based Pattern Diagnosis Using Natural Language Processing and Machine Learning

자연어 처리 및 기계학습을 통한 동의보감 기반 한의변증진단 기술 개발

  • Lee, Seung Hyeon (Department of Information System, Hanyang University) ;
  • Jang, Dong Pyo (Department of Biomedical Engineering, Hanyang University) ;
  • Sung, Kang Kyung (Department of Internal Medicine, College of Oriental Medicine, Wonkwang University)
  • 이승현 (한양대학교 공과대학 정보시스템학과) ;
  • 장동표 (한양대학교 공과대학 생체공학과) ;
  • 성강경 (원광대학교 한의과대학 한의학과 내과학교실)
  • Received : 2020.06.30
  • Accepted : 2020.07.27
  • Published : 2020.09.01

Abstract

Objectives: This paper aims to investigate the Donguibogam-based pattern diagnosis by applying natural language processing and machine learning. Methods: A database has been constructed by gathering symptoms and pattern diagnosis from Donguibogam. The symptom sentences were tokenized with nouns, verbs, and adjectives with natural language processing tool. To apply symptom sentences into machine learning, Word2Vec model has been established for converting words into numeric vectors. Using the pair of symptom's vector and pattern diagnosis, a pattern prediction model has been trained through Logistic Regression. Results: The Word2Vec model's maximum performance was obtained by optimizing Word2Vec's primary parameters -the number of iterations, the vector's dimensions, and window size. The obtained pattern diagnosis regression model showed 75% (chance level 16.7%) accuracy for the prediction of Six-Qi pattern diagnosis. Conclusions: In this study, we developed pattern diagnosis prediction model based on the symptom and pattern diagnosis from Donguibogam. The prediction accuracy could be increased by the collection of data through future expansions of oriental medicine classics.

Keywords

References

  1. Kim JK, Seol IC, Lee I, Jo HK, Yu BC, Choi SM. Report on the Korean standard differentiation of the symptoms and signs for the stroke-1. J Physiol Pathol Korean Med. 2006;20(1):229-34.
  2. Kang BK, Go HY, Kim JK, Kim BY, Ko MM, Kang KW, et al. Study of concordance rate to measure symptoms in interanl researchers. J Physiol Pathol Korean Med. 2006;20(6):1728-31.
  3. Go HY, Kim JK, Kang BK, Kim BY, Ko MM, Kang KW, et al. Report on the Korean standard differentiation of the symptoms and signs for the stroke-1 (KSDSSS-1). J Physiol Pathol Korean Med. 2006;20(6):1789-92.
  4. Go HY, Kim JK, Kang BK, Kim BY, Ko MM, Kang KW, et al. Survey of stroke subtype classification. J Physiol Pathol Korean Med. 2007;21(1):318-21.
  5. Choi SM, Yang KS. Standardization and unification of the terms and conditions used for diagnosis in oriental medicine. Korean J Orient Med. 1995;1(1):101-25.
  6. Yang KS, Choi SH, Choi SM, Park KM, Jeong WY, Ahn KS, et al. Standardization and unification of the terms and conditions used for diagnosis in oriental medicine. II. Korean J Orient Med. 1996;2(1):381-401.
  7. Choi SM, Yang KS, Choi SH, Park KM, Park JH, Shim BS, et al. Standardization and unification of the terms and conditions used for diagnosis in oriental medicine III. Korean J Orient Med. 1997;3(1):41-65.
  8. KOREA INSTITUTE OF ORIENTAL MEDICINE (KIOM). 한의학고전DB.
  9. 서울대학교 IDS연구실. 꼬꼬마(KKMA) 세종 말뭉치 활용 시스템. 2010. Available from: http://kkma.snu.ac.kr/
  10. Eddie. 딥 러닝을 이용한 자연어 처리 입문. 대한민국. Wikidocs. 2020.
  11. Yogatama. Learning Word Representations with Hierarchical Sparse Coding. ICML (International Conference on Machine Learning). 2015.