자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정

Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification

  • 김영남 (연세대학교 보건과학대학원 의생명과학전공 )
  • Young-Nam Kim (Department of Biomedical Life Science, Graduate School of Public Health Science, Yonsei University)
  • 투고 : 2022.11.30
  • 심사 : 2022.12.27
  • 발행 : 2022.12.31

초록

Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

키워드

과제정보

인공지능에 대해 문외한이었던 한의사에게 Python 코딩의 기초부터 세밀하게 가르쳐주신 김화종 교수님과, 무엇보다 『상한론(傷寒論)』의 고문자적 해석을 통하여 변병진단체계(辨病診斷體系)를 밝혀주시고 책으로 내주신 노영범 박사님과 김경일 교수님, 그리고 난치성질환으로 고통받는 환자들을 위하여 상한론(傷寒論) 변병진단체계(辨病診斷體系)를 발전시켜오신 모든 선배 연구자 분들께 깊은 존경과 감사의 인사를 드립니다. 함께 공부하는 UOI 공동체에도 항상 감사합니다.

참고문헌

  1. Sang-Un P. Analysis of the Status of Natural Language Processing Technology Based on Deep Learning. The Journal of Bigdata. 2021;6(1):63-81. doi.org/10.36498/kbigdt.2021.6.1.63 
  2. Srivastava A, Sahami M. Text mining : classification, clustering, and applications. CRC Press. 2009. 
  3. Misra-Hebert AD, Milinovich A, Zajichek A, Ji XG, Hobbs TD, Weng WN. Natural Language Processing Improves Detection of Nonsevere Hypoglycemia in Medical Records Versus Coding Alone in Patients With Type 2 Diabetes but Does Not Improve Prediction of Severe Hypoglycemia Events: An Analysis Using the Electronic Medical Record in a Large Health System. Diabetes Care. 2020;43(8):1937-40. doi.org/10.1055/s-0038-1626725 
  4. Mellia JA, Basta MN, Toyoda Y, Othman S, Elfanagely O, Morris MP. Natural Language Processing in Surgery A Systematic Review and Meta-analysis. Annals of Surgery. 2021;273(5):900-8. doi.org/10.1097/sla.0000000000004419 
  5. Jones BE, South BR, Shao Y, Lu CC, Leng J, Sauer BC. Development and Validation of a Natural Language Processing Tool to Identify Patients Treated for Pneumonia across VA Emergency Departments. Applied Clinical Informatics. 2018;9(1):122-8. doi.org/10.1055/s-0038-1626725 
  6. SeungHyeon L, DongPyo J, KangKyung S. Donguibogam-Based Pattern Diagnosis Using Natural Language Processing and Machine Learning. The Journal of Korean Medicine. 2020;41(3):1-8. doi.org/10.13048/jkm.20021 
  7. Jae-Hwa L, Hyun-Hak L. Selecting Sasang-Type classification model using machine learning and designing the service flow. Journal of Digital Contents Society. 2019;20(2):321-7. doi.org/10.9728/dcs.2019.20.2.321 
  8. Musun P, Minwoo H, Jeongyun L, Chang-Eop K, Young-Kyu K. Research on the Evaluation and Utilization of Constitutional Diagnosis by Korean Doctors using AI-based Evaluation Tool. J Physiol & Pathol Korean Med. 2022;36(2):73-8. doi.org/10.15188/kjopp.2022.04.36.2.73 
  9. Ho J, Sang-Hun L, Sangjun Y. Korean Medicine Symptom Recommendation Based on Textual Description of the Patient's Condition (Digestive disorder version). Journal of Knowledge Information Technology and Systems. 2022;15(5):831-43. doi.org/10.34163/jkits.2020.15.5.025 
  10. Craddock N, Mynors-Wallis L. Psychiatric diagnosis: impersonal, imperfect and important. The British Journal of Psychiatry. 2014;204(2):93-5. doi.org/10.1192/bjp.bp.113.133090 
  11. Pak DH MD, Hwang MG MD, Lee MJ MD, Woo SI MD, Hahn SW MD, Lee YJ MD, Hwang JU MD. Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis. Korean J Biol Psychiatry. 2020;27(1):18-26. doi.org/10.22857/kjbp.2020.27.1.003 
  12. Jin-A K Sung-Jun L. Shanghanlun Diagnostic System : Exploring Value as Narrative Medicine by Analyzing Cases. J of KMediACS. 2014;6(1):1-25. doi.org/10.22891/kmedia.2014.6.1.1 
  13. Kim KI, Roh YB. Etymological Shanghanlun, translation and interpretation based on old chinese characters. BADABOOKS. 2015. 
  14. Roh YB, Kim KI. Clinical Shanghanlun, application and practice of mental and incurable diseases. BADABOOKS. 2020. 
  15. Park EL, Cho S. KoNLPy: Korean natural language processing in Python. Proceedings of the 26th Annual Conference on Human and Cognitive Language Technology;2014 Oct 10-11, Chuncheon, Korea. 
  16. Ramos JA. Using TF-IDF to determine word relevance in document queries. Proceedings of the First Instructional Conference on Machine Learning;2003 Dec 8-10, Rutgers, NJ, USA. 
  17. Sadhasivam J, Kalivaradhan RB. An empirical comparison of supervised learning algorithms and hybrid WDBN algorithm for MOOC courses. Journal of ambient intelligence and humanized computing. 2019. doi.org/10.1007/s12652-019-01190-9 
  18. LUXING H, Jin K, Jing. A study on the lists of common korean stopwords for text mining. Korean Language Research. 2022;63(13):1-15. doi.org/10.16876/klrc.2022.63.13.1