DOI QR코드

DOI QR Code

Issue tracking and voting rate prediction for 19th Korean president election candidates

댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측

  • Seo, Dae-Ho (Graduate School of Information, Yonsei University) ;
  • Kim, Ji-Ho (Division of Industrial Management Engineering, Korea University) ;
  • Kim, Chang-Ki (Graduate School of Information, Yonsei University)
  • 서대호 (연세대학교 정보대학원) ;
  • 김지호 (고려대학교 산업경영공학부) ;
  • 김창기 (연세대학교 정보대학원)
  • Received : 2018.05.28
  • Accepted : 2018.09.12
  • Published : 2018.09.30

Abstract

With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.

인터넷의 일상화와 각종 스마트 기기의 보급으로 이용자들로 하여금 실시간 의사소통이 가능하게 하여 기존의 커뮤니케이션 양식이 새롭게 변화되었다. 인터넷을 통한 정보주체의 변화로 인해 데이터는 더욱 방대해져서 빅데이터라 불리는 정보의 초대형화를 야기하였다. 이러한 빅데이터는 사회적 실제를 이해하기 위한 새로운 기회로 여겨지고 있다. 특히 텍스트 마이닝은 비정형 텍스트 데이터를 이용해 패턴을 탐구하여 의미있는 정보를 찾아낸다. 텍스트 데이터는 신문, 도서, 웹, SNS 등 다양한 곳에 존재하기 때문에 데이터의 양이 매우 다양하고 방대하여 사회적 실제를 이해하기 위한 데이터로 적합하다. 본 연구는 한국 최대 인터넷 포털사이트 뉴스의 댓글을 수집하여 2017년 19대 한국 대선을 대상으로 연구를 수행하였다. 대선 선거일 직전 여론조사 공표 금지기간이 포함된 2017년 4월 29일부터 2017년 5월 7일까지 226,447건의 댓글을 수집하여 빈도분석, 연관감성어 분석, 토픽 감성 분석, 후보자 득표율 예측을 수행하였다. 이를 통해 각 후보자들에 대한 이슈를 분석 및 해석하고 득표율을 예측하였다. 분석 결과 뉴스 댓글이 대선 후보들에 대한 이슈를 추적하고 득표율을 예측하기에 효과적인 도구임을 보여주었다. 대선 후보자들은 사회적 여론을 객관적으로 판단하여 선거유세 전략에 반영할 수 있고 유권자들은 각 후보자들에 대한 이슈를 파악하여 투표시 참조할 수 있다. 또한 후보자들이 빅데이터 분석을 참조하여 선거캠페인을 벌인다면 국민들은 자신들이 원하는 바가 후보자들에게 피력, 반영된다는 것을 인지하고 웹상에서 더욱 적극적인 활동을 할 것이다. 이는 국민의 정치 참여 행위로써 사회적 의의가 있다.

Keywords

References

  1. Bae. J. H., J. E. Son, and M Song, "Twitter analysis of 2012 presidential elections using text mining", Intelligence Information Research, Vol. 19, No.3(2013), 141-156.
  2. Balota, David A., and James I. Chumbley. "Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage," Journal of Experimental Psychology: Human perception and performance, Vol. 10, No. 3(1984), 340. https://doi.org/10.1037/0096-1523.10.3.340
  3. Balota, David A., and James I. Chumbley. "The locus of word-frequency effects in the pronunciation task: Lexical access and/or production?.," Journal of Memory and Language, Vol. 24, No. 1(1985), 89-106. https://doi.org/10.1016/0749-596X(85)90017-8
  4. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning, research 3(2003), 993-1022.
  5. Breitzman. A. F, and Mogee. M. E, "The many applications of patent analysis", Vol. 28(2002), 187-205. https://doi.org/10.1177/016555150202800302
  6. Castro, Rodrigo, Leonardo Kuffo, and Carmen Vaca., "Back to# 6D: Predicting Venezuelan states political election results through Twitter," eDemocracy and eGovernment (ICEDEG), 2017 Fourth International Conference(2017), 148-153.
  7. Chumbley, James I., and David A. Balota, "A word's meaning affects the decision in lexical decision," Memory and Cognition, Vol. 12, No. 6(1984), 590-606. https://doi.org/10.3758/BF03213348
  8. Chakrabarti, Soumen. "Mining the Web: Discovering knowledge from hypertext data.", Elsevier(2002).
  9. Chakraborty, Goutam, Murali Pagolu, and Satish Garla. "Text mining and analysis: practical methods, examples, and case studies using SAS.", SAS Institute(2014).
  10. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., and Blei, D. M., "Reading tea leaves: How humans interpret topic models.," In Advances in neural information processing systems, (2009), 288-296.
  11. Cho. G. H, Lim. S. Y, and Hur. S, "An Analysis of the Research Methodologies and Techniques in the Industrial Engineering Using Text Mining", Journal of the Korean Institute of Industrial Engineers, vol. 40, No. 1(2014), 52-59. https://doi.org/10.7232/JKIIE.2014.40.1.052
  12. Cho. S. G, and S. B. Kim, "Finding Meaningful Pattern of Key Words in IIE Transactions Using Text Mining", Journal of the Korean Institute of Industrial Engineers, Vol. 38, No. 1(2012), 67-73. https://doi.org/10.7232/JKIIE.2012.38.1.067
  13. Choi, Y. J., and S.S. Park, "Interplay of text mining and data mining for classifying web contents.", Korean Journal of Cognitive Science Vol. 13, No. 3(2002), 33-46.
  14. Chung, Jessica Elan, and Eni Mustafaraj., "Can collective sentiment expressed on twitter predict political elections?.," AAAI. Vol. 11(2011), 1770-1771.
  15. DMC Media, "2017 Social Media Usage Behavior and Ad Contact Attitude Analysis Report", DMC Report, 2017.07.10.
  16. Fenoll, Vicente, and Lorena Cano-Oron. "Citizen engagement on Spanish political partie's Facebook pages: Analysis of the 2015 electoral campaign comments," Communication and Society, Vol. 30, No. 4(2017).
  17. Ferber, Paul, Franz Foltz, and Rudy Pugliese. "The internet and public participation: state legislature web sites and the many definitions of interactivity," Bulletin of Science, Technology and Society, Vol. 25, No. 1(2005), 85-93. https://doi.org/10.1177/0270467604271245
  18. Ha. J. W., "A Study on Internet Politics Participation of College Students", Korean Press Information(2006), 369-405.
  19. He, Wu, Shenghua Zha, and Ling Li, "Social media competitive analysis and text mining: A case study in the pizza industry." International Journal of Information Management, Vol. 33, No. 3(2013), 464-472. https://doi.org/10.1016/j.ijinfomgt.2013.01.001
  20. Holton. C, "Identifying disgruntled employee systems fraud risk through text mining: A simple solution for a multi-billion dollar problem.", Decision Support Systems, Vol. 46, No. 4(2009), 853-864. https://doi.org/10.1016/j.dss.2008.11.013
  21. Inhoff, and Albrecht Werner. "Two stages of word processing during eye fixations in the reading of prose," Journal of verbal learning and verbal behavior, Vol. 23, No. 5(1984), 612-624. https://doi.org/10.1016/S0022-5371(84)90382-7
  22. Jang. P. S., "Research on the main emotional analysis of social data", Journal of the Korea Computer Information Society, Vol. 19, No. 12(2014), 49-56.
  23. Jung. I. T., "The Effect of Voter's Use of Social Media on the Determinants of Voting", Journalism Research, Vol. 18, No. 4(2014), 239-278.
  24. Just, Marcel A., and Patricia A. Carpenter. "A theory of reading: From eye fixations to comprehension," Psychological review, Vol. 87, No. 4(1980), 329. https://doi.org/10.1037/0033-295X.87.4.329
  25. Kam. J. S, Kim. M. W, and B. H. Hyun, "A Study on Analysis of Patent Information Based Biotechnology Research Trend and Promising Research Themes", The Korea Society for Innovation Management and Economics, Vol. 21, No. 2(2013), 25-56.
  26. Kang. B. G, M. Y. Huh, and S. B. Choi, "Performance analysis of volleyball games using the social network and text mining techniques.", Journal of the Korean Data and Information Science Society Vol. 26, No. 3(2015), 619-630. https://doi.org/10.7465/jkdi.2015.26.3.619
  27. Kim. H. Y, "Analysis of an Inaugural Address of Korean Presidents Based on Network", Korea Content Association, Vol. 3, No. 2(2013), 67-68.
  28. Kim. M, Notkin. D, Grossman. D, and Wilson. G, "Identifying and summarizing systematic code changes via rule inference", Software Engineering, IEEE Transactions on, vol. 39(2013), 45-62. https://doi.org/10.1109/TSE.2012.16
  29. Lee, S. G., "Study on the Improvement of e-Learning Satisfaction based on Text Mining", Yonsei Univ Master Thesis(2018).
  30. Lee, Y, N., E. J. Choi, and M. J. Kim, "Analysis of the effects of presidential candidates' SNS reputation on election results", Digital fusion research, Vol. 16, No. 2(2018), 195-201.
  31. Liu, B., "Sentiment analysis and opinion mining," Synthesis lectures on human language technologies, Vol. 5, No. 1(2012), 1-167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
  32. Liu, G. Y., Hu, J. M., and Wang, H. L., "A co-word analysis of digital library field in China. Scientometrics,", Vol. 91, No. 1(2012), 203-217. https://doi.org/10.1007/s11192-011-0586-4
  33. Livne, A., Simmons, M. P., Adar, E., and Adamic, L. A., "The Party Is Over Here: Structure and Content in the 2010 Election," ICWSM, Vol. 11(2011), 17-21.
  34. Mimno, David, Hanna Wallach, and Andrew McCallum." Gibbs sampling for logistic normal topic models withgraph-based priors," NIPS Workshop on Analyzing Graphs. Vol. 2008(2008).
  35. Mullen, Tony, and Nigel Collier. "Sentiment analysis using support vector machines with diverse information sources," Proceedings of the 2004 conference on empirical methods in natural language processing. 2004.
  36. O'Connor, Brendan, Routledge, B. R., and Smith, N. A, "From tweets to polls: Linking text sentiment to public opinion time series," Icwsm, Vol. 11, No.122-129(2010), 1-2.
  37. Opinion Concentration Investigation Committee, "Opinion concentration survey results", 2016.01.21.
  38. Pai. M. Y, Chen. M. Y, Chu. H. C, and Chen. Y. M, "Development of a semantic-based content mapping mechanism for information retrieval", Expert Systems with Applications, vol. 40, No. 7(2013), 2447-2461. https://doi.org/10.1016/j.eswa.2012.10.056
  39. Park. H, Seo. W, Coh. B, Lee. J, and J. Yoon, "Technology Opportunity Discovery Based on Firms' Technologies and Products", Journal of the Korean Institute of Industrial Engineers, Vol. 40, No. 5(2014), 442-450. https://doi.org/10.7232/JKIIE.2014.40.5.442
  40. Rayner, and Keith. "Visual attention in reading: Eye movements reflect cognitive processes," Memory and Cognition, Vol. 5, No. 4(1977), 443-448. https://doi.org/10.3758/BF03197383
  41. Rebholz-Schuhmann, Dietrich, Harald Kirsch, and Francisco Couto, "Facts from text-is text mining ready to deliver?." PLoS biology, Vol. 3, No. 2(2005).
  42. Shin. S. W., and Y. W. Lee, "Noun and Keyword Extraction for Korean Information Processing", Journal of the Korea Computer Information Society, Vol. 14, No.3(2009), 51-56.
  43. Song. H. J., H. S. Kim, and W. J. Lee, "The Impact of Cognitive Appraisal and Emotional Response on Political Behavior", Korean Media Scholarship, Vol. 51, No. 4(2008), 353-376.
  44. Steyvers, Mark, and Tom Griffiths., "Probabilistic topic models," Handbook of latent semantic analysis, Vol. 427, No. 7(2007), 424-440.
  45. Tan, and Ah-Hwee., "Text mining: The state of the art and the challenges," In: Proceedings of the PAKDD 1999 Workshop on Knowledge Disocovery from Advanced Databases. Vol. 8(1999), 65-70.
  46. Tan, S., Cheng, X., Wang, Y., and Xu, H., "Adapting naive bayes to domain adaptation for sentiment analysis," European Conference on Information Retrieval(2009), 337-349.
  47. Tumasjan, A., Sprenger, T. O., Sandner, P. G., and Welpe, I. M., "Predicting elections with twitter: What 140 characters reveal about political sentiment," Icwsm, Vol, 10, No. 1(2010), 178-185.
  48. Vergeer, M., Hermans, L., and Sams, S. "Is the voteronly a tweet away? Micro blogging during the 2009 European Parliament election campaign in the Netherlands." First Monday, Vol, 16, No. 8(2011).
  49. Wang, H., Can, D., Kazemzadeh, A., Bar, F., and Narayanan, S., "A system for real-time twitter sentiment analysis of 2012 us presidential election cycle," In Proceedings of the ACL 2012 System Demonstrations, Association for Computational Linguistics(2012), 115-120.
  50. Welch, Susan, and John R. Hibbing., "The effects of charges of corruption on voting behavior in congressional elections, 1982-1990.," The Journal of Politics, Vol. 59, No. 1(1997), 226-239. https://doi.org/10.2307/2998224
  51. Williams, Christine B., and Girish Gulati., "The political impact of Facebook: Evidence from the 2006 midterm elections and 2008 nomination contest," Politics and Technology Review 1.1. (2008), 11-24..
  52. Xianghua, F., Guo, L., Yanyan, G., and Zhiqiang, W., "Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon," Knowledge-Based Systems, Vol. 37(2013), 186-195. https://doi.org/10.1016/j.knosys.2012.08.003
  53. Yonhap News, "Men in their 30s spend the most time for commenting in Naver news.", 2016.05.29.
  54. Yoo. H. J., "An Empirical Study on the Effect of Information Environment on Voter Choice in Election", Korean Political Science Bulletin, Vol. 42, No. 4(2008), 155-188.

Cited by

  1. SNS 빅데이터 및 검색포털 트렌드와 마약류 사건 통계간의 비교 및 의미분석 연구 vol.19, pp.2, 2021, https://doi.org/10.14400/jdc.2021.19.2.231