DOI QR코드

DOI QR Code

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques

텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석

  • Bae, Jung-Hwan (Dept. of Library and Information Science, Yonsei University) ;
  • Son, Ji-Eun (Dept. of Library and Information Science, Yonsei University) ;
  • Song, Min (Dept. of Library and Information Science, Yonsei University)
  • 배정환 (연세대학교 문헌정보학과 대학원) ;
  • 손지은 (연세대학교 문헌정보학과 대학원) ;
  • 송민 (연세대학교 문헌정보학과)
  • Received : 2013.05.25
  • Accepted : 2013.07.23
  • Published : 2013.09.30

Abstract

Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

최근 소셜미디어는 전세계적 커뮤니케이션 도구로서 사용에 전문적인 지식이나 기술이 필요하지 않기 때문에 이용자들로 하여금 콘텐츠의 실시간 생산과 공유를 가능하게 하여 기존의 커뮤니케이션 양식을 새롭게 변화시키고 있다. 특히 새로운 소통매체로서 국내외의 사회적 이슈를 실시간으로 전파하면서 이용자들이 자신의 의견을 지인 및 대중과 소통하게 하여 크게는 사회적 변화의 가능성까지 야기하고 있다. 소셜미디어를 통한 정보주체의 변화로 인해 데이터는 더욱 방대해지고 '빅데이터'라 불리는 정보의 '초(超)범람'을 야기하였으며, 이러한 빅데이터는 사회적 실제를 이해하기 위한 새로운 기회이자 의미 있는 정보를 발굴해 내기 위한 새로운 연구분야로 각광받게 되었다. 빅데이터를 효율적으로 분석하기 위해 다양한 연구가 활발히 이루어지고 있다. 그러나 지금까지 소셜미디어를 대상으로 한 연구는 개괄적인 접근으로 제한된 분석에 국한되고 있다. 이를 적절히 해결하기 위해 본 연구에서는 트위터 상에서 실시간으로 방대하게 생성되는 빅스트림 데이터의 효율적 수집과 수집된 문헌의 다양한 분석을 통한 새로운 정보와 지식의 마이닝을 목표로 사회적 이슈를 포착하기 위한 실시간 트위터 트렌드 마이닝 시스템을 개발 하였다. 본 시스템은 단어의 동시출현 검색, 질의어에 의한 트위터 이용자 시각화, 두 이용자 사이의 유사도 계산, 트렌드 변화에 관한 토픽 모델링 그리고 멘션 기반 이용자 네트워크 분석의 기능들을 제공하고, 이를 통해 2012년 한국 대선을 대상으로 사례연구를 수행하였다. 본 연구를 위한 실험문헌은 2012년 10월 1일부터 2012년 10월 31일까지 약 3주간 1,737,969건의 트윗을 수집하여 구축되었다. 이 사례연구는 최신 기법을 사용하여 트위터에서 생성되는 사회적 트렌드를 마이닝 할 수 있게 했다는 점에서 주요한 의의가 있고, 이를 통해 트위터가 사회적 이슈의 변화를 효율적으로 추적하고 예측하기에 유용한 도구이며, 멘션 기반 네트워크는 트위터에서 발견할 수 있는 고유의 비가시적 네트워크로 이용자 네트워크의 또 다른 양상을 보여준다.

Keywords

References

  1. Blei, D., A. Ng, and M. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol.3(2003), 993-1022.
  2. Cha, M., H. Haddadi, F. Benevenutoz, and K. P. Gummadi, "Measuring user influence in Twitter : the million follower," Proceedings of the 4th International AAAI Conference on Weblogs and Social Media(2010).
  3. Chang, J., J. Boyd-Graber, C. Wang, S. Gerrish, and D. Blei, "Reading tea leaves : how humans interpret topic models," Neural Information Processing Systems(2009), 288-296.
  4. Cho, H. S. and J. Y. Kim, "Political Communication and Civic Participation Through Blogs and Twitter," Cyber Communication, Vol.29, No.2 (2012), 95-130.
  5. Choi, D. J., M. H. Min, J. K. Kim, and J. H. Lee, "A Study on topic tracking using microblog," Proceedings of KIIS Spring Conference, Vol.21(2011), 80-82.
  6. Drezner, D. W. and F. Henry, "Introduction-blogs, politics and power : a special issue of public choice," Public Choice, Vol.134, Issue.1-2(2007), 1-13. https://doi.org/10.1007/s11127-007-9206-5
  7. Gam, M. A. and M. Song, "A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis," Journal of Intelligence and Information Systems, Vol.18, No.3(2012), 53-77.
  8. Huberman, B. A., D. M. Romero, and F. Wu, Social networks that matter : Twitter under the microscope, SSRN, 2008. Available at http://ssrn.com/abstract=1313405(Accessed 12 May, 2013).
  9. Jansen, B. J., M. Zhang, K. Sobel, and A. Chowdury," Twitter power : tweets as electronic word of mouth," JASIST, Vol.60, Issue.11(2009), 1-20. https://doi.org/10.1002/asi.20971
  10. Weng, J., E. P. Lim, J. Jiang, and Q. He, "Twitterrank : Finding Topic-sensitive Influential Twitterers," Proceedings of the 3rd ACM international conference on Web search and data mining(2010), 261-270.
  11. Jansen, H. J. and Koop, R., "Pundits, Ideologues, and Ranters : the british columbia election online," Canadian Journal of Communication, Vol.30, Issue.4(2005), 613-632.
  12. Java, A., X. Song, T. Finin, and B. Tseng, "Why we twitter : understanding microblogging usage andcommunities," Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, (2007), 56-65.
  13. Kwak, H. Y., C. H. Lee, H. S. Park, and S. Moon, "What is Twitter, a social network or a news media?," Proceedings of the 19th International conference on WWW, (2010), 591-600.
  14. Kim, Y., N. Kim, and S. R. Jeong, "Stock-Index Invest Model Using News Big Data Opinion Mining," Journal of Intelligence and Information Systems, Vol.18, No.2(2012), 143-156.
  15. Kim, Y. H., "Prediction of structure of spread of public opinion at Twitter with special emphasis on by-election for Seoul mayor," Political Communication, Vol.23(2011), 103-139.
  16. Lee, K. M., H. Namgoong, E. H. Kim, G. Y. Lee, and H. K. Kim, "Analysis of multi-dimensional interaction among SNS users," Journal of Korean Society for Internet Information, Vol.12, No.2(2010), 113-122.
  17. Livne, A., M. Simmons, E. Adar, and L. Adamic, "The party is overhere : Structure and content in the 2010 election," Proceedings of 5th ICWSM(2011).
  18. Mimno, D. M. and McCallum, A., "Topic models conditioned on arbitrary features with Dirichlet-multinomial regression," UAI(2008), 411-418.
  19. OʼConnor, B., R. Balasubramanyan, B. R. Routledge, and N. A. Smith, "From tweets to polls : linking text sentiment to public opinion timeseries," Proceedings of 4th ICWSM(2010), 122-129.
  20. Seol, K. S., J. D. Kim, H. N. Shim and D. G. Baik, "Intimacy measurement between adjacent users in social networks," Journal of KIISE, Vol.38, No.2(2012), 335-341.
  21. Sohn, J. S., S. W. Cho, K. L. Kwon, and I. J. Chung, "Improved Social Network Analysis Method in SNS," Journal of Intelligence and Information Systems, Vol.18, No.4(2012), 117-127.
  22. Song, X., Y. Chi, K. Hino, and B. Tseng, "Identifying Opinion Leaders in the Blogosphere," Proceedings of the 16th ACM Conference on information and Knowledge Management (2007), 971-974.
  23. Tumasjan, A., T. O. Sprenger, P. G. Sandner, and I. M. Welpe, "Predicting elections with Twitter : what 140 characters reveal about political sentiment," Proceedings of the 4th International AAAI Conference on Weblogs and Social Media(2010), 178-185.
  24. Williams, C. and G. Gulati, "The political impact of facebook : Evidence from the 2006 midterm elections and 2008 nomination contest," Politics and Technology Review, Vol.1(2008), 11-21.
  25. Williams, C. and G. Gulati, "What is a Social Network Worth? Facebook and Vote Share in the 2008 Presidential Primaries," In Annual Meeting of the American Political Science Association( 2008), 1-17.
  26. Xu, X., N. Yuruk, Z. Feng, and T. A. J. Schweiger, "SCAN : a structural clustering algorithm for networks," KDD '2007 Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 824-833.
  27. Yan, E., Y. Ding, S. Milojevic, and C. R. Sugimoto, "Topics in dynamic research communities : An exploratory study for the field of information retrieval," Journal of Informetrics, Vol. 6, Issue.1(2012), 140-153. https://doi.org/10.1016/j.joi.2011.10.001
  28. Zhao, W., X. J. Jiang, J. Weng, J. He, E. Lim, H. Yan, and X. Li, "Comparing Twitter and Traditional Media Using Topic Models," Advances in Information Retrieval Lecture Notes in Computer Science, Vol.6611(2011), 338-349.

Cited by

  1. Analyzing the Issue Life Cycle by Mapping Inter-Period Issues vol.20, pp.4, 2014, https://doi.org/10.13088/jiis.2014.20.4.25
  2. Twitter Issue Tracking System by Topic Modeling Techniques vol.20, pp.2, 2014, https://doi.org/10.13088/jiis.2014.20.2.109
  3. A Pilot Study on Applying Text Mining Tools to Analyzing Steel Industry Trends : A Case Study of the Steel Industry for the Company "P" vol.19, pp.3, 2014, https://doi.org/10.7838/jsebs.2014.19.3.051
  4. Analysis of Outdoor Wear Consumer Characteristics and Leading Outdoor Wear Brands Using SNS Social Big Data vol.18, pp.1, 2016, https://doi.org/10.5805/SFTI.2016.18.1.48
  5. Live tweeting live debates: How Twitter reflects and refracts the US political climate in a campaign season pp.1468-4462, 2018, https://doi.org/10.1080/1369118X.2018.1503697
  6. Analysis of Twitter Public Opinion inDifferent Political Views: A Case Study of Sewol Ferry Accident vol.60, pp.2, 2013, https://doi.org/10.20879/kjjcs.2016.60.2.010
  7. 토픽모델링 기반 행복과 불행 이슈 분석 및 행복 증진 방안 연구 vol.17, pp.2, 2013, https://doi.org/10.15813/kmr.2016.17.2.007
  8. 공공기록관의 소셜미디어 이용 현황 및 이용자 관심도 분석: 국가기록원과 대통령기록관을 중심으로 vol.33, pp.2, 2013, https://doi.org/10.3743/kosim.2016.33.2.135
  9. 2014년~2015년 국가기록원 관련 트윗 이슈분석 vol.50, pp.None, 2016, https://doi.org/10.20923/kjas.2016.50.139
  10. Devote to the Welfare of Human Beings: Types, Motives, and Emotions of Ethical Consumption as Revealed by Social Big Data vol.17, pp.4, 2013, https://doi.org/10.21074/kjlcap.2016.17.4.875
  11. 텍스트 마이닝을 이용한 정보보호인식 분석 및 강화 방안 모색 vol.23, pp.4, 2013, https://doi.org/10.22693/niaip.2016.23.4.076
  12. 토픽 모델링을 이용한 트위터 데이터의 공간 분포 패턴 분석 vol.23, pp.2, 2017, https://doi.org/10.26863/jkarg.2017.05.23.2.376
  13. Web mining for the mayoral election prediction in Taiwan vol.69, pp.6, 2013, https://doi.org/10.1108/ajim-02-2017-0035
  14. 교육정책관련 여론탐색을 위한 소셜미디어 감정분석 연구 vol.24, pp.4, 2017, https://doi.org/10.22693/niaip.2017.24.4.003
  15. 고객센터를 통한 고객지식 확보 전략: 음성인식기술의 적용 사례 vol.19, pp.1, 2018, https://doi.org/10.15813/kmr.2018.19.1.009
  16. 댓글이 음원 판매량에 미치는 차별적 영향에 관한 텍스트마이닝 분석 vol.19, pp.2, 2013, https://doi.org/10.15813/kmr.2018.19.2.005
  17. 토픽모델링을 이용한 교육정책 키워드 기반 소셜미디어 분석 vol.19, pp.4, 2013, https://doi.org/10.7472/jksii.2018.19.4.53
  18. 용어 사전의 특성이 문서 분류 정확도에 미치는 영향 연구 vol.37, pp.4, 2013, https://doi.org/10.29214/damis.2018.37.4.003
  19. 대통령 기록관 및 기록물에 대한 SNS 이용자 인식변화 분석: 탄핵 전후 기간의 트위터와 뉴스 프레임 분석을 중심으로 vol.19, pp.1, 2013, https://doi.org/10.14404/jksarm.2019.19.1.167
  20. 트위터 메시지 분석을 통한 선거 결과 예측 고찰: 18대 대선을 중심으로 vol.19, pp.4, 2013, https://doi.org/10.5392/jkca.2019.19.04.174
  21. 소셜미디어를 통한 우울 경향 이용자 담론 주제 분석 vol.36, pp.4, 2019, https://doi.org/10.3743/kosim.2019.36.4.207
  22. Political Astroturfing on Twitter: How to Coordinate a Disinformation Campaign vol.37, pp.2, 2013, https://doi.org/10.1080/10584609.2019.1661888
  23. 텍스트마이닝 기법을 이용한 『상한론』 내의 증상-본초 조합의 탐색적 분석 vol.34, pp.4, 2020, https://doi.org/10.15188/kjopp.2020.08.34.4.159
  24. 정치 PR 전략으로서의 SNS 메시지 : 21대 총선을 중심으로 vol.20, pp.9, 2013, https://doi.org/10.5392/jkca.2020.20.09.208
  25. Sentiment analysis as a research method on partisanship in the presidential coverage vol.65, pp.1, 2013, https://doi.org/10.20879/kjjcs.2021.65.1.035