DOI QR코드

DOI QR Code

Building and Analyzing Panic Disorder Social Media Corpus for Automatic Deep Learning Classification Model

딥러닝 자동 분류 모델을 위한 공황장애 소셜미디어 코퍼스 구축 및 분석

  • 이수빈 (연세대학교 문헌정보학과) ;
  • 김성덕 (연세대학교 문헌정보학과) ;
  • 이주희 (연세대학교 문헌정보학과) ;
  • 고영수 (연세대학교 문헌정보학과) ;
  • 송민 (연세대학교 문헌정보학과)
  • Received : 2021.05.17
  • Accepted : 2021.06.15
  • Published : 2021.06.30

Abstract

This study is to create a deep learning based classification model to examine the characteristics of panic disorder and to classify the panic disorder tendency literature by the panic disorder corpus constructed for the present study. For this purpose, 5,884 documents of the panic disorder corpus collected from social media were directly annotated based on the mental disease diagnosis manual and were classified into panic disorder-prone and non-panic-disorder documents. Then, TF-IDF scores were calculated and word co-occurrence analysis was performed to analyze the lexical characteristics of the corpus. In addition, the co-occurrence between the symptom frequency measurement and the annotated symptom was calculated to analyze the characteristics of panic disorder symptoms and the relationship between symptoms. We also conducted the performance evaluation for a deep learning based classification model. Three pre-trained models, BERT multi-lingual, KoBERT, and KcBERT, were adopted for classification model, and KcBERT showed the best performance among them. This study demonstrated that it can help early diagnosis and treatment of people suffering from related symptoms by examining the characteristics of panic disorder and expand the field of mental illness research to social media.

본 연구는 공황장애 말뭉치 구축과 분석을 통해 공황장애의 특성을 살펴보고 공황장애 경향 문헌을 분류할 수 있는 딥러닝 자동 분류 모델을 만들고자 하였다. 이를 위해 소셜미디어에서 수집한 공황장애 관련 문헌 5,884개를 정신 질환 진단 매뉴얼 기준으로 직접 주석 처리하여 공황장애 경향 문헌과 비 경향 문헌으로 분류하였다. 이 중 공황장애 경향 문헌에 나타난 어휘적 특성 및 어휘의 관계성을 분석하기 위해 TF-IDF값을 산출하고 단어 동시출현 분석을 실시하였다. 공황장애의 특성 및 증상 간의 관련성을 분석하기 위해 증상 빈도수와 주석 처리된 증상 번호 간의 동시출현 빈도수를 산출하였다. 또한, 구축한 말뭉치를 활용하여 딥러닝 자동 분류 모델 학습 및 성능 평가를 하였다. 이를 위하여 최신 딥러닝 언어 모델 BERT 중 세 가지 모델을 활용하였고 이 중 KcBERT가 가장 우수한 성능을 보였다. 본 연구는 공황장애 관련 증상을 겪는 사람들의 조기 진단 및 치료를 돕고 소셜미디어 말뭉치를 활용한 정신 질환 연구의 영역을 확장하고자 시도한 점에서 의의가 있다.

Keywords

Acknowledgement

본 연구는 정부의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(NRF-2018S1A3A2075114).

References

  1. ETRI (2019). KorBERT. Available: https://aiopen.etri.re.kr/service_dataset.php
  2. Kim, Gyeong-Min, Kim, Kue-kyeng, Jo, Jae-choon, & Lim, Heui-Seok (2018). Constructing for Korean traditional culture corpus and development of named entity recognition model using Bi-LSTM-CNN-CRFs. Journal of the Korea Convergence Society, 9(12), 47-52. https://doi.org/10.15207/JKCS.2018.9.12.047
  3. Kim, Hyun-Ji, Park, Seo-Jeong, Song, Chae-Min, & Song, Min (2019). Text mining driven content analysis of social perception on schizophrenia before and after the revision of the terminology. Journal of the Korean Society for Library and Information Science, 53(4), 285-307.
  4. Ko, Eun-Jung, Choi, Young-Hee, Park, Gi-Hwan, & Lee, Jung-Heum (2000). Clinical characteristics of panic disorder. Journal of the Korean Society of Biological Therapies in Psychiatry, 6(2), 188-198.
  5. Lee, Hyun-Joo, Gim, Min-Sook, Kim, Se-Joo, Park, Seon-Cheol, Yang, Jong-Chul, Lee, Kyoung-Uk, Lee, Sang-Hyuk, Lee, Seung-Jae, Lim, Se-Won, Chae, Jeong-Ho, Han, Sang-Woo, Hong, Jin-Pyo, & Seo, Ho-Jun (2019). The bodily panic symptoms and predisposing stressors in Korean patients with panic disorder. Korean Neuropsychiatric Association, 58(4), 339-345. https://doi.org/10.4306/jknpa.2019.58.4.339
  6. Lee, Sungjick & Kim, Han-joon (2009). Keyword extraction from news corpus using modified TF-IDF. The Jounal of Society for e-Business Studies, 14(4), 59-73.
  7. Paek, Hye-Jin, Cho, Hye-Jin, & Kim, Jung-Hyun (2017). Content analysis of news coverage on stigma and attribution regarding mental illness. Korean Journal of Journalism & Communication Studies, 61(4), 7-43. https://doi.org/10.20879/kjjcs.2017.61.4.001
  8. Park, Chan-Jun, Park, Ki-Nam, Moon, Hyeon-Seok, Eo, Su-Gyeong, & Lim, Heui-Seok (2021). A study on performance improvement considering the balance between corpus in neural machine translation. Journal of the Korea Convergence Society, 12(5), 23-29. https://doi.org/10.15207/JKCS.2021.12.5.023
  9. Park, Soo-Hyun (2017). Evidence-based treatment of panic disorder. Korean Journal of Clinical Psychology, 36(4), 458-469. https://doi.org/10.15842/kjcp.2017.36.4.002
  10. Seoul Asan Hospital (2014). Disease encyclopedia panic disorder. Available: http://www.amc.seoul.kr/asan/healthinfo/disease/diseaseDetail.do?contentId=31583
  11. Seoul National University Hospital (2010). N medical information panic disorder. Available: http://www.snuh.org/health/nMedInfo/nView.do?category=DIS&medid=AA000344
  12. Shin, Seo-Hee (2017). Panic disorder patients in the last 5 years Treatment trend analysis. Health Insurance Review & Assessment Service.
  13. Shin, Yong-Wook (2014). Seoul Asan Hospital. Available: http://psy.amc.seoul.kr/asan/depts/psy/K/bbsDetail.do?menuId=862&contentId=213922
  14. Song, Min (2017). Textmining. Seoul: Chungram.
  15. Yu, J. (2019). Text mining for identifying topics in internet Q&A about adolescents' sexual concerns. Journal of Health Informatics and Statistics, 44(2), 181-188. https://doi.org/10.21032/jhis.2019.44.2.181
  16. American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders (5th ed.). Washington, DC: Author.
  17. Benton, A., Mitchell, M., & Hovy, D. (2017). Multi-task Learning for Mental Health using Social Media Text. arXiv preprint arXiv:1712.03538.
  18. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  19. Devlin, J. (2021). Bert multilingual. Available: https://github.com/google-research/bert/blob/master/multilingual.md
  20. Du, J., Zhang, Y., Luo, J., Jia, Y., Wei, Q., Tao, C., & Xu, H. (2018). Extracting psychiatric stressors for suicide from social media using deep learning. BMC medical informatics and decision making, 18(2), 77-87. https://doi.org/10.1186/s12911-018-0632-8
  21. Jeon, Heewon. (2018). KoSpacing: Automatic Korean word spacing. Available: https://github.com/haven-jeon/PyKoSpacing
  22. Jeon, Heewon. (2021). KoBERT. Available: https://github.com/SKTBrain/KoBERT
  23. Ko, Hyunwoong. (2021). Korean Sentence Splitter. Available: https://github.com/hyunwoongko/kss
  24. Lee, Junbum. (2021). KcBERT: Korean Comments BERT. Available: https://github.com/Beomi/KcBERT
  25. Medrouk, L. & Pappa, A. (2017). Deep learning model for sentiment analysis in multi-lingual corpus. In International Conference on Neural Information Processing (pp. 205-212). Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_22
  26. Moessner, M., Feldhege, J., Wolf, M., & Bauer, S. (2018). Analyzing big data in social media: Text and network analyses of an eating disorder forum. International Journal of Eating Disorders, 51(7), 656-667. https://doi.org/10.1002/eat.22878
  27. Mozafari, M., Farahbakhsh, R., & Crespi, N. (2019). A BERT-based transfer learning approach for hate speech detection in online social media. In International Conference on Complex Networks and Their Applications (pp. 928-940). Springer, Cham. https://doi.org/10.1007/978-3-030-36687-2_77
  28. Roy-Byrne, P. P., Craske, M. G., & Stein, M. B. (2006). Panic disorder. The Lancet, 368(9540), 1023-1032. https://doi.org/10.1016/S0140-6736(06)69418-X
  29. Salton, G. & M. J. McGill. (1983). Introduction to modern information retrieval.
  30. Sekulic, I. & Strube, M. (2020). Adapting deep learning methods for mental health prediction on social media. arXiv preprint arXiv:2003.07634.
  31. Song, Min. (2021. June 6). treform. Available: https://github.com/MinSong2/treform
  32. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
  33. Yu, L., Jiang, W., Ren, Z., Xu, S., Zhang, L., & Hu, X. (2021). Detecting changes in attitudes toward depression on Chinese social media: A text analysis. Journal of affective disorders, 280, 354-363. https://doi.org/10.1016/j.jad.2020.11.040