DOI QR코드

DOI QR Code

A Deep Learning-based Depression Trend Analysis of Korean on Social Media

딥러닝 기반 소셜미디어 한글 텍스트 우울 경향 분석

  • Park, Seojeong (Department of Library and Information Science, Yonsei University) ;
  • Lee, Soobin (Department of Library and Information Science, Yonsei University) ;
  • Kim, Woo Jung (Department of Psychiatry, Yongin Severance Hospital, Yonsei University College of Medicine) ;
  • Song, Min (Department of Library and Information Science, Yonsei University)
  • Received : 2022.02.14
  • Accepted : 2022.03.04
  • Published : 2022.03.30

Abstract

The number of depressed patients in Korea and around the world is rapidly increasing every year. However, most of the mentally ill patients are not aware that they are suffering from the disease, so adequate treatment is not being performed. If depressive symptoms are neglected, it can lead to suicide, anxiety, and other psychological problems. Therefore, early detection and treatment of depression are very important in improving mental health. To improve this problem, this study presented a deep learning-based depression tendency model using Korean social media text. After collecting data from Naver KonwledgeiN, Naver Blog, Hidoc, and Twitter, DSM-5 major depressive disorder diagnosis criteria were used to classify and annotate classes according to the number of depressive symptoms. Afterwards, TF-IDF analysis and simultaneous word analysis were performed to examine the characteristics of each class of the corpus constructed. In addition, word embedding, dictionary-based sentiment analysis, and LDA topic modeling were performed to generate a depression tendency classification model using various text features. Through this, the embedded text, sentiment score, and topic number for each document were calculated and used as text features. As a result, it was confirmed that the highest accuracy rate of 83.28% was achieved when the depression tendency was classified based on the KorBERT algorithm by combining both the emotional score and the topic of the document with the embedded text. This study establishes a classification model for Korean depression trends with improved performance using various text features, and detects potential depressive patients early among Korean online community users, enabling rapid treatment and prevention, thereby enabling the mental health of Korean society. It is significant in that it can help in promotion.

국내를 비롯하여 전 세계적으로 우울증 환자 수가 매년 증가하는 추세이다. 그러나 대다수의 정신질환 환자들은 자신이 질병을 앓고 있다는 사실을 인식하지 못해서 적절한 치료가 이루어지지 않고 있다. 우울 증상이 방치되면 자살과 불안, 기타 심리적인 문제로 발전될 수 있기에 우울증의 조기 발견과 치료는 정신건강 증진에 있어 매우 중요하다. 이러한 문제점을 개선하기 위해 본 연구에서는 한국어 소셜 미디어 텍스트를 활용한 딥러닝 기반의 우울 경향 모델을 제시하였다. 네이버 지식인, 네이버 블로그, 하이닥, 트위터에서 데이터수집을 한 뒤 DSM-5 주요 우울 장애 진단 기준을 활용하여 우울 증상 개수에 따라 클래스를 구분하여 주석을 달았다. 이후 구축한 말뭉치의 클래스 별 특성을 살펴보고자 TF-IDF 분석과 동시 출현 단어 분석을 실시하였다. 또한, 다양한 텍스트 특징을 활용하여 우울 경향 분류 모델을 생성하기 위해 단어 임베딩과 사전 기반 감성 분석, LDA 토픽 모델링을 수행하였다. 이를 통해 문헌 별로 임베딩된 텍스트와 감성 점수, 토픽 번호를 산출하여 텍스트 특징으로 사용하였다. 그 결과 임베딩된 텍스트에 문서의 감성 점수와 토픽을 모두 결합하여 KorBERT 알고리즘을 기반으로 우울 경향을 분류하였을 때 가장 높은 정확률인 83.28%를 달성하는 것을 확인하였다. 본 연구는 다양한 텍스트 특징을 활용하여 보다 성능이 개선된 한국어 우울 경향 분류 모델을 구축함에 따라, 한국 온라인 커뮤니티 이용자 중 잠재적인 우울증 환자를 조기에 발견해 빠른 치료 및 예방이 가능하도록 하여 한국 사회의 정신건강 증진에 도움을 줄 수 있는 기반을 마련했다는 점에서 의의를 지닌다.

Keywords

Acknowledgement

This work was supported by a National Research Foundation of Korea grant funded by the Korean government (NRF-2018S1A3A2075114).

References

  1. Aizawa, A. (2003). An information-theoretic perspective of tf-idf measures. Information Processing & Management, 39(1), 45-65. http://doi.org/10.1109/ICHI.2018.00058
  2. Al Essa, A. (2018). Efficient Text Classification with Linear Regression Using a Combination of Predictors for Flu Outbreak Detection. Doctoral dissertation, University of Bridgeport.
  3. Alessa, A., Faezipour, M., & Alhassan, Z. (2018). Text classification of flu-related tweets using fasttext with sentiment and keyword features. In 2018 Institute of Electrical and Electronics Engineers International Conference on Healthcare Informatics (ICHI), 366-367. http://doi.org/10.1109/ICHI.2018.00058
  4. Athiwaratkun, B., Wilson, A. G., & Anandkumar, A. (2018). Probabilistic fasttext for multi-sense word embeddings. arXiv. https://doi.org/10.48550/arXiv.1806.02901
  5. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022. https://doi.org/10.1016/b978-0-12-411519-4.00006-9
  6. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
  7. Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Social Science Information, 22(2), 191-235. https://doi/org/10.1177/053901883022002003
  8. Cheng, C. H. & Chen, H. H. (2019). Sentimental text mining based on an additional features method for text classification. PloS One, 14(6), e0217591. https://doi.org/10.1371/journal.pone.0217591
  9. Chronis, G. & Erk, K. (2020). When is a bishop not like a rook? When it's like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships. In Proceedings of the 24th Conference on Computational Natural Language Learning, 227-244. https://doi.org/10.18653/v1/2020.conll-1.17
  10. Conway, M. & O'Connor, D. (2016). Social media, big data, and mental health: current advances and ethical implications. Current Opinion in Psychology, 9, 77-82. https://doi.org/10.1016/j.copsyc.2016.01.004
  11. Coppersmith, G., Dredze, M., & Harman, C. (2014, June). Quantifying mental health signals in Twitter. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 51-60. https://doi.org/10.3115/v1/w14-3207
  12. Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K., & Mitchell, M. (2015). CLPsych 2015 shared task: Depression and PTSD on Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 31-39. https://doi.org/10.3115/v1/w15-1204
  13. De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., & Kumar, M. (2016). Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2098-2110. https://doi.org/10.1145/2858036.2858207
  14. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. http://arxiv.org/abs/1810.04805
  15. Friedrich, M. J. (2017). Depression is the leading cause of disability around the world. Jama, 317(15), 1517-1517. https://doi.org/10.1001/jama.2017.3826
  16. Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H., & Eichstaedt, J. C. (2017). Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences, 18, 43-49. https://doi.org/10.1016/j.cobeha.2017.07.005
  17. Kim Y. (2014). Convolutional neural networks for sentence classification. EMNLP2014-2014 Conference on Empirical Methods in Natural Language Processig, Association for Computational Linguistics, 1746-1751. https://doi.org/10.3115/v1/d14-1181
  18. KNU Korean Emotion Dictionary (2018, November 5). Available: https://github.com/park1200656/KnuSentiLex
  19. Lalithamani, N., Thati, L. S., & Adhikesavan, R. (2014). Sentence level sentiment polarity calculation for customer reviews by considering complex sentential structures. IJRET: International Journal of Research in Engineering and Technology, 3(3), 433-438. https://doi.org/10.15623/ijret.2014.0303081
  20. Lee G. (2019). Korean Ebedding. Korea: Acorn Publishing.
  21. Liang, H., Sun, X., Sun, Y., & Gao, Y. (2017). Text feature extraction based on deep learning: a review. EURASIP Journal on Wireless Communications and Networking, 2017(1), 1-12. https://doi.org/10.1186/s13638-017-0993-1
  22. Lilleberg, J., Zhu, Y., & Zhang, Y. (2015). Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), 136-140. Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICCI-CC.2015.7259377
  23. Lim, J. H., Kim, H. K., & Kim, Y. K. (2020). Recent R&D trends for pretrained language model. Electronics and Telecommunications Trends, 35(3), 9-19. https://doi.org/10.22648/ETRI.2020.J.350302
  24. Liu, G. & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325-338. https://doi.org/10.1016/j.neucom.2019.01.078
  25. Martin, L., Muller, B., Suarez, P. J. O., Dupont, Y., Romary, L., de la Clergerie, E. V., Seddah, D., & Sagot, B. (2019). Camembert: a tasty french language model. https://doi.org/10.18653/v1/2020.acl-main.645
  26. Moon, E. & Han, S. (2011). A qualitative method to find influencers using similarity-based approach in the blogosphere. International Journal of Social Computing and Cyber-Physical Systems, 1(1), 56-78. https://doi.org/10.1504/ijsccps.2011.043604
  27. Mowery, D., Smith, H., Cheney, T., Stoddard, G., Coppersmith, G., Bryan, C., & Conway, M. (2017). Understanding depressive symptoms and psychosocial stressors on Twitter: a corpus-based study. Journal of Medical Internet Research, 19(2), e48. https://doi.org/10.2196/jmir.6895
  28. Nam, K. K., Ackerman, M. S., & Adamic, L. A. (2009). Questions in, knowledge in? A study of Naver's question answering community. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 779-788. https://doi.org/10.1145/1518701.1518821
  29. Orabi, A. H., Buddhitha, P., Orabi, M. H., & Inkpen, D. (2018). Deep learning for depression detection of twitter users. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, 88-97. https://doi.org/10.18653/v1/W18-0609
  30. Pasupa, K. & Ayutthaya, T. S. N. (2019). Thai sentiment analysis with deep learning techniques: A comparative study based on word embedding, POS-tag, and sentic features. Sustainable Cities and Society, 50, 101615. https://doi.org/10.1016/j.scs.2019.101615
  31. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543. https://doi.org/10.3115/v1/D14-1162
  32. Petterson, J., Smola, A. J., Caetano, T. S., Buntine, W. L., & Narayanamurthy, S. M. (2010). Word features for latent dirichlet allocation. In NIPS, 1921-1929. https://doi.org/10.1.1.942.7045
  33. Qaiser, S. & Ali, R. (2018). Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395
  34. Resnik, P., Armstrong, W., Claudino, L., Nguyen, T., Nguyen, V. A., & Boyd-Graber, J. (2015). Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 99-107. https://doi.org/10.3115/v1/w15-1212
  35. Resnik, P., Garron, A., & Resnik, R. (2013). Using topic modeling to improve prediction of neuroticism and depression in college students. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1348-1353. url: https://www.aclweb.org/anthology/D13-1133
  36. Ruas, T., Ferreira, C. H. P., Grosky, W., de Franca, F. O., & de Medeiros, D. M. R. (2020). Enhanced word embeddings using multi-semantic representation through lexical chains. Information Sciences, 532, 16-32. https://doi.org/10.1016/j.ins.2020.04.048
  37. Schwartz, H. A., Eichstaedt, J., Kern, M., Park, G., Sap, M., Stillwell, D., Kosinski, M., & Ungar, L. (2014). Towards assessing changes in degree of depression through facebook. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 118-125. https://doi.org/10.3115/v1/w14-3214
  38. Tadesse, M. M., Lin, H., Xu, B., & Yang, L. (2019). Detection of depression-related posts in reddit social media forum. IEEE(Institute of Electrical and Electronics Engineers) Access, 7, 44883-44893. https://doi.org/10.1109/ACCESS.2019.2909180
  39. Trotzek, M., Koitka, S., & Friedrich, C. M. (2018). Early detection of depression based on linguistic metadata augmented classifiers revisited. In International Conference of the Cross-Language Evaluation Forum for European Languages, 191-202. Springer, Cham. https://doi.org/10.1007/978-3-319-98932-7_18
  40. Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015). Recognizing depression from twitter activity. In Proceedings of the 33rd annual ACM conference on Human Factors in Computing Systems, 3187-3196. https://doi.org/10.1145/2702123.2702280
  41. Turney, P. D. & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141-188. https://doi.org/10.1613/jair.2934
  42. Wang, Z. Y., Li, G., Li, C. Y., & Li, A. (2012). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855-875. https://doi.org/10.1007/s11192-011-0563-y
  43. World Health Organization (2020). Available: https://www.who.int/health-topics/depression#tab=tab_1
  44. Yin, Z. & Shen, Y. (2018). On the dimensionality of word embedding. arXiv preprint arXiv:1812.04224. https://doi.org/10.48550/arXiv.1812.04224
  45. Yun-tao, Z., Ling, G., & Yong-cheng, W. (2005). An improved TF-IDF approach for text classification. Journal of Zhejiang University-Science A, 6(1), 49-55. https://doi.org/10.1007/BF02842477
  46. Zhang, L., Huang, X., Liu, T., Li, A., Chen, Z., & Zhu, T. (2014). Using linguistic features to estimate suicide probability of Chinese microblog users. In International Conference on Human Centered Computing, 549-559. Springer, Cham. https://doi.org/10.1007/978-3-319-15554-8_45
  47. Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K. W. (2018). Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496. https://doi.org/10.18653/v1/d18-1521