DOI QR코드

DOI QR Code

TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs

  • 최준연 (세종대학교 디지털콘텐츠학과) ;
  • 백혜득 (세종대학교 디지털콘텐츠학과) ;
  • 최진호 (세종대학교 경영학부)
  • 투고 : 2014.01.31
  • 심사 : 2014.03.02
  • 발행 : 2014.03.28

초록

소셜미디어 확산으로 많은 사용자들이 SNS를 통해 자신의 생각과 의견을 표출하며 다른 사용자들과 상호작용하고 있다. 특히 트위터와 같은 마이크로블로그는 짧은 문장을 통해 영화, TV, 사회 현상 등과 같은 공통의 주제에 대해 많은 사람이 즉각적으로 의견을 표출하고 교환하는 플랫폼의 역할을 수행하고 있다. TV방송 프로그램에 대해서도 의견과 감정을 마이크로블로그를 통해 표출하고 있는데, 본 연구에서는 마이크로블로그의 내용과 시청률과의 관계를 살펴보기 위해, 지난 공중파 방송 프로그램에 대한 트윗을 수집하고 부적절한 트윗들을 제거한 후 형태소 분석을 수행하였다. 추출된 형태소뿐 아니라 이모티콘, 신조어 등 사용자가 입력한 모든 단어들을 후보 자질로 삼아 시청률과의 상관관계를 분석하였다. 실험을 위해 2013년 1월부터 10개월간의 예능프로그램 트윗의 데이터를 수집하여 전국 시청률 데이터와 비교 분석을 수행하였다. 트윗의 발생량은 일주일 중 방송된 요일에 가장 많았으며, 특히 방송시간 부근에서 급격히 증가하는 모습을 보였다. 이것은 전국에 동시간에 방송되는 공중파 프로그램의 특성상 공통된 관심 주제를 제공하기 때문에 나타나는 현상으로 여겨진다. 횟수 기반 자질로 방송 일의 총 트윗 수와 리트윗 수, 방송시간 중의 트윗 수와 리트윗 수와 시청률과의 상관 관계를 분석하였으나 모두 낮은 상관 계수를 나타냈다. 이것은 단순한 트윗 발생 빈도는 방송 프로그램의 만족도 또는 시청률을 제대로 반영하고 있지 못함을 의미한다. 내용 기반 자질로 추출한 단어들 중에는 높은 상관관계를 보여주는 단어들이 발견되었으며, 표준어가 아닌 이모티콘과 신조어 중에도 높은 상관관계를 보여주는 자질이 나타났다. 또한 방송시작 전과 후에 따라 상관계수가 높은 단어가 상이함을 발견하였다. 매주 같은 시간에 방송되는 TV 프로그램의 특성상, 방송을 기다리고 기대하는 내용의 트윗과 방송 후 소감을 표현하는 트윗의 내용에 차이가 존재하였다. 이러한 분석결과는 단어에 따라 시청률과 연관성이 높은 시간대가 달라짐을 의미하며, 시청률을 측정하고자 할 때 각 단어들의 시간대를 고려해서 사용해야 함을 의미한다. 본 연구에서 제안한 방법은 기존의 표본 추출을 통해 이루어지는 TV 시청률 측정을 보완할 수 있는 방법에 활용할 수 있으리라 기대된다.

Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

키워드

참고문헌

  1. Asur, S. and B. A. Huberman, "Predicting the future with social media," Proceedings of IEEE/WIC/ACM International Conference Web Intelligence and Intelligent Agent Technology (WI-IAT), (2010), 492-499.
  2. Bae, J.-A. and S.-M. Choi, "SNS Interaction Surrounding TV Shows," Journal of Cybercommunication Academic Society, Vol.30, No.1(2013), 47-92.
  3. Bae, J.-H., J.-E. Son, and M. Song, "Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques," Journal of Intelligence and Information Systems, Vol.19, No.3 (2013), 141-156.
  4. Bollen, J., H. Mao, and X. Zeng, "Twitter mood predicts the stock market," Journal of Computational Science, Vol.2, No.1 (2011), 1-8. https://doi.org/10.1016/j.jocs.2010.12.007
  5. Cha, J. Y. and K. H. Lee, "The relation of the activity of twitter of TV news and TV ratings," Proceedings of the Korea Speech and Communication Associatin Conference, (2012), 169-187.
  6. Chang, J.-Y., "Automatic Retrieval of SNS Opinion Document Using Machine Learning Technique," The Journal of The Institute of Internet, Broadcasting and Communication, Vol.13, No.5 (2013), 27-35.
  7. Cho, H., Y. Chung, J. Lee, and J.-H. Lee, "Sentiment Analysis Using News Comments for Public Opinion Mining," Proceedings of the Korea Intelligent Information System Society Conference, (2013), 149-150.
  8. Gunther, O., H. Krasnova, D. Riehle, and V. Schondienst, "Modeling Microblogging Adoption in the Enterprise," Proceedings of the Fifteenth Americas Conference on Information Systems, (2009), 544.
  9. Hong, C.-H. and H.-S. Kim, "Comparative Study of Various Machine-learning Features for Tweets Sentiment Classification," Journal of the Korea Contents Association, Vol.12, No.12(2012), 471-478.
  10. Hwang, Y., "Characteristics of Interactions between Fan and Celebrities on Twitter," Journal of the Korea Contents Association, Vol.13, No.8 (2013), 72-82.
  11. Jang, J. Y. and I. M. Kim, "An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency," Journal of the Institute of Internet Broadcasting and Communication, Vol.12, No.5 (2012), 243-253.
  12. Java, A., X. Song, T. Finin, and B. Tseng, "Why we twitter: understanding microblogging usage and communities," Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, (2007), 56-65.
  13. Kim, K., C.-Y. Park, J. Lee, and J.-H. Lee, "Keyword Extraction Affecting Mood Change of The Public using Micro-blog," Korean Institute of Intelligent Systems Conference, (2013), 89-90.
  14. Kim, S.-C. and J. C. Park, "Age Prediction from Korean Tweets with Style-based Feature Analysis," HCI Korea 2012, (2012), 177-180.
  15. Kramer, A. D., "An unobtrusive behavioral model of gross national happiness," Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (2010), 287-290.
  16. Lee, C., D. Choi, S. Kim, and J. Kang, "Classification and Analysis of Emotion in Korean Microblog Texts," Journal of KISS: Databases, Vol.40, No.3 (2013a), 159-167.
  17. Lee, K.-M., K.-H. In, and U.-M. Kim, "Analysis of Expected Rating Using Opinion Mining," Proceedings of the Korean Institute of Information Scientists and Engineers Conference, (2013b), 365-367.
  18. Luckerson, V. Twitter's Big Push to Monetize Social TV. Available at http://business.time.com/2013/08/08/twitters-big-push-to-monetize-social-tv/(Accessed 10 March, 2014).
  19. Nam, Y., I. Son and D. Lee, "The Impact of Message Characteristics on Online Viral Diffusion in Online Social Media Services : The Case of Twitter," Journal of Intelligence and Information Systems, Vol. 17, No. 4 (2011), 75-94.
  20. Nielsen. New Study Confirms Correlation Between Twitter and TV Ratings. Available at http://www.nielsen.com/us/en/press-room/2013/new-study-confirms-correlation-between-twitter-and-tv-ratings.html(Accessed 10 March,2014).
  21. Tumasjan, A., T. O. Sprenger, P. G. Sandner, and I. M. Welpe, "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment," Proceedings of ICWSM, Vol. 10, (2010), 178-185.
  22. Wang, D. S., J. S. Shon, K. G. Lee, and I. J. Jung, "Measuring the improved national happiness index in SNS" Proceeding of the Korea Intelligent Information System Society Conference, (2012), 77-80.
  23. Yun, H. and B. Park, "Social TV : The Effects of Source Credibility and Repetition Type on Viewing Intention," Korean Journal of Journalism & Communication Studies, Vol. 57, No. 1 (2013), 364-391.