DOI QR코드

DOI QR Code

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being

주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법

  • 최석재 (경희대학교 빅데이터 연구센터) ;
  • 송영은 (경희대학교 일반대학원 경영학과) ;
  • 권오병 (경희대학교 경영학과)
  • Received : 2016.02.22
  • Accepted : 2016.03.03
  • Published : 2016.03.31

Abstract

Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

의료IT 서비스의 유망 분야인 정신건강 증진을 위한 주관적 웰빙 서비스(subjective well-being service) 구현의 핵심은 개인의 주관적 웰빙 상태를 정확하고 무구속적이며 비용 효율적으로 측정하는 것인데 이를 위해 보편적으로 사용되는 설문지에 의한 자기보고나 신체부착형 센서 기반의 측정 방법론은 정확성은 뛰어나나 비용효율성과 무구속성에 취약하다. 비용효율성과 무구속성을 보강하기 위한 온라인 텍스트 기반의 측정 방법은 사전에 준비된 감정어 어휘만을 사용함으로써 상황에 따라 감정어로 볼 수 있는 이른바 상황적 긍부정성(contextual polarity)을 고려하지 못하여 측정 정확도가 낮다. 한편 기존의 상황적 긍부정성을 활용한 감성분석으로는 주관적 웰빙 상태인 맥락에서의 감성분석을 할 수 있는 감정어휘사전이나 온톨로지가 구축되어 있지 않다. 더구나 온톨로지 구축도 매우 노력이 소요되는 작업이다. 따라서 본 연구의 목적은 온라인상에 사용자의 의견이 표출된 비정형 텍스트로부터 주관적 웰빙과 관련한 상황감정어를 추출하고, 이를 근거로 상황적 긍부정성 파악의 정확도를 개선하는 방법을 제안하는 것이다. 기본 절차는 다음과 같다. 먼저 일반 감정어휘사전을 준비한다. 본 연구에서는 가장 대표적인 디지털 감정어휘사전인 SentiWordNet을 사용하였다. 둘째, 정신건강지수를 동적으로 추정하는데 필요한 비정형 자료인 Corpora를 온라인 서베이로 확보하였다. 셋째, Corpora로부터 세 가지 종류의 자원을 확보하였다. 넷째, 자원을 입력변수로 하고 특정 정신건강 상태의 지수값을 종속변수로 하는 추론 모형을 구축하고 추론 규칙을 추출하였다. 마지막으로, 추론 규칙으로 정신건강 상태를 추론하였다. 본 연구는 감정을 분석함에 있어, 기존의 연구들과 달리 상황적 감정어를 적용하여 특정 도메인에 따라 다양한 감정 어휘를 파악할 수 있다는 점에서 독창성이 있다.

Keywords

References

  1. Agarwal, B., N. Mittal, P. Bansal, and S. Garg, "Sentiment Analysis Using Common-Sense and Context Information," Computational Intelligence and Neuroscience, Vol.2015(2015), Article ID 715730, 1-9.
  2. Ahn, S.H., S.H. Lee, and O.S. Kwon, "Activation Dimension : A Mirage in the Affective Space?," The Korean Journal of Social and Personality Psychology, Vol.7, No.1(1993), 107-123.
  3. Baccianella, S., A. Esuli and F. Sebastiani, "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," Proceedings of the 7th Conference on International Language Resources and Evaluation(LREC), Vol.10(2010), 2200-2204.
  4. Cambria, E., D. Olsher, and D. Rajagopal, "SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis," Twenty-eighth AAAI Conference on Artificial Intelligence, (2014), 1515-1521.
  5. Cambria., E., "Affective Computing and Sentiment Analysis," IEEE Intelligent Systems, Vol.31, No.2(2016), 1-7. https://doi.org/10.1109/MIS.2016.51
  6. Choi, S. and O. Kwon, "The Study of Developing Korean SentiWordNet for Big Data Analytics: Focusing on Anger Emotion," Journal of Society for e-Business Studies, Vol.19, No.4(2014), 1-19.
  7. Christensen, T.C., L.F. Barrett, E. Bliss-Moreau, K. Lebo, and C. Kaschub, "A Practical Guide to Experience-Sampling Procedures," Journal of Happiness Studies, Vol.4, No.1(2003), 53-78. https://doi.org/10.1023/A:1023609306024
  8. Diener, E., "Subjective Well-Being," Psychological Bulletin, Vol.95, No.3(1984), 542-575. https://doi.org/10.1037/0033-2909.95.3.542
  9. Diener, E., The Science of Well-Being, Springer Netherlands, 2009.
  10. Diener, E., E. M. Suh, R. E. Lucas, and H.L. Smith, "Subjective Well-Being: Three Decades of Progress," Psychological Bulletin, Vol.125, No.2(1999), 276-302. https://doi.org/10.1037/0033-2909.125.2.276
  11. Diener, E., E. Sandvik, and R.J. Larsen, "Age and Sex Effects for Emotional Intensity," Developmental Psychology, Vol.21, No.3(1985), 542-546. https://doi.org/10.1037/0012-1649.21.3.542
  12. Dodds, P. S., K. D. Harris, I. M. Kloumann, C. A. Bliss, and C. M. Danforth, "Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter," PLOS One, Vol.6, No.12:e26752(2011), 1-26. https://doi.org/10.1371/journal.pone.0026752
  13. Dodds, P.S. and C.M. Danforth, "Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents," Journal of Happiness Studies, Vol.11, No.4(2010), 441-456. https://doi.org/10.1007/s10902-009-9150-9
  14. Esuli, A. and F. Sebastiani, "SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining," Proceedings of the 5th International Conference on Language Resources and Evaluation, Vol.6(2006), 417-422.
  15. Fehr, B. and J.A. Russell, "Concept of Emotion Viewed from a Prototype Perspective," Journal of Experimental Psychology: General, Vol.113, No.3(1984), 464-486. https://doi.org/10.1037/0096-3445.113.3.464
  16. Gim, E. Y., A Study on the Korean Emotion, PhD Thesis, Chonnam National University, 2004.
  17. Hatzivassiloglou, V. and K. R. McKeown, "Predicting the Semantic Orientation of Adjectives," Proceedings of ACL-97, 35th Annual Meeting of the Association for Computational Linguistics, (1997), 174-181.
  18. Havasi, C., R. Speer, and J. Alonso, "ConceptNet 3: A Flexible, Multilingual Semantic Network for Common Sense Knowledge," Recent Advances in Natural Language Processing, (2007), 27-29.
  19. Jang, J. Y., K. Ryu, E. K. Suh, and I. C. Choi, "Quality of Life of Working Men, Women, and Housewives Measured by the Day Reconstruction Method (DRM)," Korean Journal of Social and Personality Psychology, Vol.21, No.2(2007), 123-139.
  20. Jeong, H. J. and B. H. Park, "Korean Word Sense Disambiguation using Dictionary and Corpus," Journal of Intelligent Information Systems, Vol.21, No.1(2015), 1-13.
  21. Kahneman, D., A. B. Krueger, D. A. Schkade, N. Schwarz, and A. A. Stone, "A Survey Method for Characterizing Daily Life Experience: The Day Reconstruction Method," Science, Vol.306, No.5702(2004), 1776-1780. https://doi.org/10.1126/science.1103572
  22. Kamps, J., M. J. Marx, R. J. Mokken, and M. D. Rijke, "Using WordNet to Measure Semantic Orientation of Adjectives," Proceedings of LREC-04, 4th International Conference on Language Resources and Evaluation, Vol.4(2004), 1115-1118.
  23. Kang, S.P., The Effects of Self-Leadership on Psychological, Subjective Well-being: Perceived Organizational Justice A Moderator, Master's Thesis, Chosun National University, 2015.
  24. Kim, S., E.H. Lee, S.T. Hwang, S.H. Hong, and K. Lee, "Reliability and Validity of the Korean Version of the Beck Hopelessness Scale," Journal of Korean Neuropsychiatric Association, Vol.54, No.1(2015), 84-90. https://doi.org/10.4306/jknpa.2015.54.1.84
  25. Kim, S.W. and N.G. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 133-148. https://doi.org/10.13088/JIIS.2014.20.1.133
  26. Kwon, O.B. and S.J. Choi, "A Methodology of Measuring Degree of Contextual Subjective Well-Being Using Affective Predicates for Mental Health Aware Service," Journal of Intelligence and Information Systems, Vol.17, No.3(2011), 1-23.
  27. Liu, H. and P. Singh, "ConceptNet - a Practical Commonsense Reasoning Tool-Kit," BT Technology Journal, Vol.22, No.4(2004), 211-226. https://doi.org/10.1023/B:BTTJ.0000047600.45421.6d
  28. Ortony, A., G. L. Clore, and A. Collins, The Cognitive Structure of Emotions, Cambridge University Press, 1990.
  29. Park, J. E., S. J. Shim, and H. G. Lee, "The Method of Measuring Subjective Quality of Life," Daejeon Statistical Research Institute, (2012a), 143-214.
  30. Park, J. I., Y. J. Kim, and M. J. Cho, "Factor Structure of the 12-item General Health Questionnaire in the Korean General Adult Population," Journal of Korean Neuropsychiatric Association, Vol.51(2012b), 178-184. https://doi.org/10.4306/jknpa.2012.51.4.178
  31. Park, I. J., The Analysis of Korean Affective Terms: Listing Affective Terms and Exploring Dimensions in the Affective Terms, PhD Thesis, Seoul National University, 2001.
  32. Qi, J., X. Fu, and G. Zhu, "Subjective Well-Being Measurement Based on Chinese Grassroots Blog Text Sentiment Analysis," Information & Management, Vol.52, No.7(2015), 859-869. https://doi.org/10.1016/j.im.2015.06.002
  33. Schler, J., M. Koppel, S. Argamon, and J. Pennebaker, "Effects of Age and Gender on Blogging," Proceedings of the 2006 AAAI spring symposium, Vol.6(2006), 199-205.
  34. Shin, S. I., "The Validity and Reliability of the Korean Version of the General Health Questionnaire: KGHQ-20 & KGHQ-30," Korean Journal of Social Welfare, Vol.46(2001), 210-230.
  35. Sommerer, C. and M. Laurent, "Mobile Feelings-Wireless Communication of Heartbeat and Breath for Mobile Art," in The Mobile Audience Media Art and Mobile Technologies, M. Rieser(eds.), Rodopi Publications, 2011, 271-275.
  36. Strapparava, C. and A. Valitutti, "WordNet-Affect: An Affective Extension of WordNet," Language Resources and Evaluation, Vol.4(2004), 1083-1086.
  37. Turney, P.D. and M.T. Littman, "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Transactions on Information Systems, Vol.21, No.4(2003), 315-346. https://doi.org/10.1145/944012.944013
  38. Vu, X. S., H. J. Song, and S. B. Park, "Building a Vietnamese SentiWordNet using Vietnamese Electronic Dictionary and String Kernel," 13th Pacific Rim Knowledge Acquisition Workshop, (2014), 223-235.
  39. Watson, D., L. A. Clark, and A. Tellegen, "Development and Validation of Brief Measures of Positive and Negative Affect: the PANAS Scales," Journal of Personality and Social Psychology, Vol.54, No.6(1988), 1063-1070. https://doi.org/10.1037/0022-3514.54.6.1063
  40. Medagoda, N., S. Shanmuganathan, and J. Whalley, "Sentiment Lexicon Construction Using SentiWordNet 3.0," Proceedings of the 11th International Conference on Natural Computation, (2015), 802-807.
  41. Wiebe, J., T. Wilson, and C. Cardie, "Annotating Expressions of Opinions and Emotions in Language," Language Resources and Evaluation, Vol.39, No.2(2005), 165-210. https://doi.org/10.1007/s10579-005-7880-9
  42. Yasunari, Y., S. Kim., T. Kawano, and T. Kilazoe, "Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face," Proceedings of the 2000 IEEE International Workshop on Robot and Human Interactive Communication, (2000), 178-183.

Cited by

  1. 지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법 vol.23, pp.3, 2016, https://doi.org/10.13088/jiis.2017.23.3.119
  2. Identifying Social Relationships using Text Analysis for Social Chatbots vol.24, pp.4, 2016, https://doi.org/10.13088/jiis.2018.24.4.085