DOI QR코드

DOI QR Code

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis

텍스트 분류 기반 기계학습의 정신과 진단 예측 적용

  • Pak, Doohyun (Department of Psychiatry, Soonchunhyang University Seoul Hospital) ;
  • Hwang, Mingyu (Department of Psychiatry, Soonchunhyang University Seoul Hospital) ;
  • Lee, Minji (Department of Psychiatry, Soonchunhyang University Seoul Hospital) ;
  • Woo, Sung-Il (Department of Psychiatry, Soonchunhyang University Seoul Hospital) ;
  • Hahn, Sang-Woo (Department of Psychiatry, Soonchunhyang University Seoul Hospital) ;
  • Lee, Yeon Jung (Department of Psychiatry, Soonchunhyang University Seoul Hospital) ;
  • Hwang, Jaeuk (Department of Psychiatry, Soonchunhyang University Seoul Hospital)
  • 백두현 (순천향대학교 서울병원 정신건강의학과) ;
  • 황민규 (순천향대학교 서울병원 정신건강의학과) ;
  • 이민지 (순천향대학교 서울병원 정신건강의학과) ;
  • 우성일 (순천향대학교 서울병원 정신건강의학과) ;
  • 한상우 (순천향대학교 서울병원 정신건강의학과) ;
  • 이연정 (순천향대학교 서울병원 정신건강의학과) ;
  • 황재욱 (순천향대학교 서울병원 정신건강의학과)
  • Received : 2019.11.11
  • Accepted : 2020.03.09
  • Published : 2020.04.30

Abstract

Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

Keywords

References

  1. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 2016;6:26094. https://doi.org/10.1038/srep26094
  2. Nguyen P, Tran T, Wickramasinghe N, Venkatesh S. $\mathtt {Deepr}$: a convolutional net for medical records. IEEE J Biomed Health Inform 2017;21:22-30. https://doi.org/10.1109/JBHI.2016.2633963
  3. Craddock N, Mynors-Wallis L. Psychiatric diagnosis: impersonal, imperfect and important. Br J Psychiatry 2014;204:93-95. https://doi.org/10.1192/bjp.bp.113.133090
  4. Weiss SM, Indurkhya N, Zhang T, Damerau F. Text Mining: Predictive Methods for Analyzing Unstructured Information. New York, NY: Springer Science & Business Media;2010.
  5. Srivastava AN, Sahami M. Text Mining: Classification, Clustering, and Applications. Boca Raton, FL: Chapman and Hall/CRC;2009.
  6. Deo RC. Machine learning in medicine. Circulation 2015;132:1920-1930. https://doi.org/10.1161/CIRCULATIONAHA.115.001593
  7. Noorbakhsh-Sabet N, Zand R, Zhang Y, Abedi V. Artificial intelligence transforms the future of health care. Am J Med 2019;132:795-801. https://doi.org/10.1016/j.amjmed.2019.01.017
  8. Forsting M. Machine learning will change medicine. J Nucl Med 2017;58:357-358. https://doi.org/10.2967/jnumed.117.190397
  9. Banerjee I, Ling Y, Chen MC, Hasan SA, Langlotz CP, Moradzadeh N, et al. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med 2019;97:79-88. https://doi.org/10.1016/j.artmed.2018.11.004
  10. Chen MC, Ball RL, Yang L, Moradzadeh N, Chapman BE, Larson DB, et al. Deep learning to classify radiology free-text reports. Radiology 2018;286:845-852. https://doi.org/10.1148/radiol.2017171115
  11. Tran T, Kavuluru R. Predicting mental conditions based on "history of present illness" in psychiatric notes with deep neural networks. J Biomed Inform 2017;75 Suppl:S138-S148. https://doi.org/10.1016/j.jbi.2017.06.010
  12. Sadock BJ, Sadock VA, Ruiz P. Kaplan and Sadock's Synopsis of Psychiatry: Behavioral Sciences/Clinical Psychiatry. Philadelphia, PA: Wolters Kluwer Health;2014. p.192-211.
  13. American Psychiatric Association. Diagnosis and Statistical Manual of Mental Disorders: DSM-IV. 4th ed. Washington, DC: American Psychiatric Association;1994.
  14. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM-5. 5th ed. Arlington, VA: American Psychaitric Association;2013.
  15. Bird S, Klein E, Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sevastopol, CA: O'Reilly Media, Inc.;2009.
  16. Park EL, Cho S. KoNLPy: Korean natural language processing in Python. Proceedings of the 26th Annual Conference on Human and Cognitive Language Technology;2014 Oct 10-11, Chuncheon, Korea.
  17. Ramos JA. Using TF-IDF to determine word relevance in document queries. Proceedings of the First Instructional Conference on Machine Learning;2003 Dec 8-10, Rutgers, NJ, USA.
  18. Cataltepe Z, Aygun E. An improvement of centroid-based classification algorithm for text classification. Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering;2007 Apr 17-20, Istanbul, Turkey.
  19. Le Q, Mikolov T. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning;2014 Jun 22-24, Beijing, China.
  20. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-444. https://doi.org/10.1038/nature14539
  21. Kingma DP, Ba J. Adam: a method for stochastic optimization. Proceedings of the 3rd International Conference for Learning Representations; 2015 May 7-9, San Diego, CA, USA.
  22. Kiers HA, Rasson JP, Groenen PJ, Schader M. Data Analysis, Classification, and Related Methods. Heidelberg: Springer-Verlag;2000. p.181-186.
  23. Fawcett T. An introduction to ROC analysis. Pattern recognition letters 2006;27:961-874. https://doi.org/10.1016/j.patrec.2005.10.010
  24. Bellmann R. Dynamic Programming. Princeton, NJ: Princeton University Press;1957.
  25. Giger ML, Suzuki K. Computer-aided diagnosis. In: Feng DD, editor. Biomedical Information Technology. Burlington, MA: Elsevier; 2008. p.359-370.
  26. Jeong J, Jee M, Go M, Kim H, Lim H, Lee Y, et al. Related documents classification system by similarity between documents. Journal of Broadcast Engineering 2019;24:77-86. https://doi.org/10.5909/JBE.2019.24.1.77
  27. Kim D, Koo M-W. Categorization of Korean news articles based on convolutional neural network using Doc2Vec and Word2Vec. Journal of KIISE 2017;44:742-747. https://doi.org/10.5626/JOK.2017.44.7.742
  28. Kim J-M, Lee J-H. Text document classification based on recurrent neural network using Word2vec. J Korean Inst Intell Syst 2017;27:560-565. https://doi.org/10.5391/JKIIS.2017.27.6.560
  29. Heo S-W, Sohn K-A. Feature extraction to detect hoax articles. Journal of KIISE 2016;43:1210-1215. https://doi.org/10.5626/JOK.2016.43.11.1210
  30. He H, Garcia EA. Learning from imbalanced datas. IEEE Trans Knowl Data Eng 2009;21:1263-1284. https://doi.org/10.1109/TKDE.2008.239
  31. Forman G, Scholz M. Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explorations Newsletter 2010;12:49-57. https://doi.org/10.1145/1882471.1882479
  32. Liu Y, Loh HT, Sun A. Imbalanced text classification: a term weighting approach. Expert Systems with Applications 2009;36:690-701. https://doi.org/10.1016/j.eswa.2007.10.042
  33. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer Series in Statistics;2001.
  34. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res 2003;3:1157-1182.
  35. Tang J, Alelyani S, Liu H. Feature selection for classification: a review. In: Aggarwal CC, editor. Data Classification: Algorithms and Applications. Boca Raton, FL: Chapman & Hall/CRC;2015. p/37-64.
  36. Geman S, Bienenstock E, Doursat R. Neural Networks and the Bias/Variance Dilemma. Cambridge, MA: MIT Press;1992. p.1-58.