DOI QR코드

DOI QR Code

A Survey on Automatic Twitter Event Summarization

  • Rudrapal, Dwijen (Dept. of Computer Science and Engineering, National Institute of Technology) ;
  • Das, Amitava (Dept. of Computer Science and Engineering, Indian Institute of Information Technology) ;
  • Bhattacharya, Baby (Dept. of Mathematics, National Institute of Technology)
  • Received : 2017.11.08
  • Accepted : 2017.12.26
  • Published : 2018.02.28

Abstract

Twitter is one of the most popular social platforms for online users to share trendy information and views on any event. Twitter reports an event faster than any other medium and contains enormous information and views regarding an event. Consequently, Twitter topic summarization is one of the most convenient ways to get instant gist of any event. However, the information shared on Twitter is often full of nonstandard abbreviations, acronyms, out of vocabulary (OOV) words and with grammatical mistakes which create challenges to find reliable and useful information related to any event. Undoubtedly, Twitter event summarization is a challenging task where traditional text summarization methods do not work well. In last decade, various research works introduced different approaches for automatic Twitter topic summarization. The main aim of this survey work is to make a broad overview of promising summarization approaches on a Twitter topic. We also focus on automatic evaluation of summarization techniques by surveying recent evaluation methodologies. At the end of the survey, we emphasize on both current and future research challenges in this domain through a level of depth analysis of the most recent summarization approaches.

Keywords

E1JBB0_2018_v14n1_79_f0001.png 이미지

Fig. 1. Unigram words graph of a tweet.

E1JBB0_2018_v14n1_79_f0002.png 이미지

Fig. 2. Tree structure of Twitter topic summarization approaches.

E1JBB0_2018_v14n1_79_f0003.png 이미지

Fig. 3. Phrase graph of tweets using PR algorithm.

Table 1. Brief report on extractive summarization approaches

E1JBB0_2018_v14n1_79_t0001.png 이미지

Table 2. Brief report on abstractive summarization approaches

E1JBB0_2018_v14n1_79_t0002.png 이미지

References

  1. C. Smith, "400 Interesting Twitter facts, demographics and statistics (November 2017)," 2017 [Online]. Available: http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitterstats/.
  2. Twitter usage statistics [Online]. Available: http://www.internetlivestats.com/twitter-statistics/.
  3. The top 500 sites on the web [Online]. Available: http://www.alexa.com/topsites.
  4. M. Isaac and S. Ember, "For Election Day influence, Twitter ruled social media," The New York Times, 2016 [Online]. Available: https://www.nytimes.com/2016/11/09/technology/for-election-day-chatter-twitterruled- social-media.html.
  5. M. Kaufmann, "Syntactic normalization of twitter messages," in Proceedings of International Conference on Natural Language Processing (ICON), Kharagpur, India, 2010.
  6. B. Han and T. Baldwin, "Lexical normalisation of short text messages: makn sens a #twitter," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, 2011, pp. 368-378.
  7. D. Rudrapal, A. Jamatia, K. Chakma, A. Das, and B. Gamback, "Sentence boundary detection for social media text," in Proceedings of the 12th International Conference on Natural Language, Trivandrum, India, 2015, pp. 254-260.
  8. C. Lin, C. Lin, J. Li, D. Wang, Y. Chen, and T. Li, "Generating event storylines from microblogs," in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, 2012, pp. 175-184.
  9. Z. Wang, L. Shou, K. Chen, G. Chen, and S. Mehrotra, "On summarization and timeline generation for evolutionary tweet streams," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1301-1315, 2015. https://doi.org/10.1109/TKDE.2014.2345379
  10. B. Sharifi, M. A. Hutton, and J. Kalita, "Automatic summarization of Twitter topics," in Proceedings of National Workshop on Design and Analysis of Algorithm, Tezpur, India, 2010.
  11. B. Sharifi, M. A. Hutton, and J. Kalita, "Summarizing microblogs automatically," in Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Los Angeles, CA, 2010, pp. 685-688.
  12. J. Judd and J. Kalita, "Better Twitter summaries?," in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, 2013, pp. 445-449.
  13. J. Nichols, J. Mahmud, and C. Drews, "Summarizing sporting events using Twitter," in Proceedings of the ACM International Conference on Intelligent User Interfaces, New York, NY, 2012, pp. 189-198.
  14. S. Harabagiu and A. Hickl, "Relevance modeling for microblog summarization," in Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM), Barcelona, Spain, 2011, pp. 514-517.
  15. X. Liu, Y. Li, F. Wei, and M. Zhou, "Graph-based multi-tweet summarization using social signals," in Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), Bombay, India, 2012, pp. 1699-1714.
  16. J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell, "Summarizing text documents: sentence selection and evaluation metrics," in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, pp. 121-128.
  17. S. Brin and L. Page, "The anatomy of a large-scale hypertextual web search engine," Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107-117, 1998. https://doi.org/10.1016/S0169-7552(98)00110-X
  18. W. Xu, R. Grishman, A. Meyers, and A. Ritter, "A preliminary study of tweet summarization using information extraction," in Proceedings of the Workshop on Language Analysis in Social Media, Atlanta, GA, 2013, pp. 20-29.
  19. M. A. H. Khan, D. Bollegala, G. Liu, and K. Sezaki, "Multi-tweet summarization of real-time events," in Proceedings of International Conference on Social Computing, Alexandria, VA, 2013, pp. 128-133.
  20. D. M. Blei and J. D. Lafferty, "Dynamic topic models," in Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006, pp. 113-120.
  21. R. Mihalcea and P. Tarau, "TextRank: bringing order into texts," in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004, pp. 404-411.
  22. B. Xu, H. Hao, Y. Wu, H. Zhang, and C. Liu, "TR-LDA: a cascaded key-bigram extractor for microblog summarization," International Journal of Machine Learning and Computing, vol. 5, no. 3, pp. 172-178, 2015. https://doi.org/10.7763/IJMLC.2015.V5.503
  23. T. Y. Kim, J. Kim, J. Lee, and J. H. Lee, "A Tweet summarization method based on a keyword graph," in Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, Siem Reap, Cambodia, 2014, pp. 1-8.
  24. M. Schinas, S. Papadopoulos, Y. Kompatsiaris, and P. A. Mitkas, "MGraph: multimodal event summarization in social media using topic models and graph based ranking," International Journal of Multimedia Information Retrieval, vol. 5, no. 1, pp. 51-69, 2016. https://doi.org/10.1007/s13735-015-0089-9
  25. Q. Qu, S. Liu, F. Zhu, and C. S. Jensen, "Efficient online summarization of large-scale dynamic networks," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3231-3245, 2016. https://doi.org/10.1109/TKDE.2016.2601611
  26. M. A. Mosa, A. Hamouda, and M. Marei, "Graph coloring and ACO based summarization for social networks," Expert Systems with Applications, vol. 74, pp.115-126, 2017. https://doi.org/10.1016/j.eswa.2017.01.010
  27. D. Inouye, "Multiple post microblog summarization," University of Colorado at Colorado Springs, 2010.
  28. D. Arthur and S. Vassilvitskii, "k-means++: the advantages of careful seeding," in Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, 2007, pp. 1027-1035.
  29. G. Beverungen and J. Kalita, "Evaluating methods for summarizing Twitter posts," in Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), Hong Kong, China, 2011, pp. 1-6.
  30. R. Tibshirani, G. Walther, and T. Hastie, "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 63, no. 2, pp. 411-423, 2001. https://doi.org/10.1111/1467-9868.00293
  31. F. Perez-Tellez, D. Pinto, J. Cardiff, and P. Rosso, "On the difficulty of clustering company tweets," in Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, Toronto, Canada, 2010, pp. 92-102.
  32. X. Yang, A. Ghoting, Y. Ruan, and S. Parthasarathy, "A framework for summarizing and analyzing Twitter feeds," in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 370-378.
  33. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," in Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 81-92.
  34. A. Zubiaga, D. Spina, E. Amigo, and J. Gonzalo, "Towards real-time summarization of scheduled events from Twitter streams," in Proceedings of the 23rd ACM Conference on Hypertext and Social Media, Milwaukee, WI, 2012, pp. 319-320.
  35. S. Kullback and R. A. Leibler, "On information and sufficiency," The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79-86, 1951. https://doi.org/10.1214/aoms/1177729694
  36. C. Shen, F. Liu, F. Weng, and T. Li, "A participant-based approach for event summarization using Twitter streams," in Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies, Atlanta, GA, 2013, pp. 1152-1162.
  37. L. Lee, "Measures of distributional similarity," in Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, MD, 1999, pp. 25-32.
  38. Y. Duan, Z. Chen, F. Wei, M. Zhou, and H. Y. Shum, "Twitter topic summarization by ranking tweets using social influence and content quality," in Proceedings of the 24th International Conference on Computational Linguistics, Mumbai, India, 2012, pp. 763-780.
  39. D. Chakrabarti and K. Punera, "Event summarization using Tweets," in Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM), Barcelona, Spain, 2011, pp. 66-73.
  40. W. X. Zhao, J. R. Wen, and X. Li, "Generating timeline summaries with social media attention," Frontiers of Computer Science, vol. 10, no. 4, pp. 702-716, 2016. https://doi.org/10.1007/s11704-015-5145-3
  41. D. Gao, W. Li, and R. Zhang, "Sequential summarization: a new application for timely updated Twitter trending topics," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, 2013, pp. 567-571.
  42. Y. Zhou, N. Kanhabua, and A. I. Cristea, "Real-time timeline summarisation for high-impact events in Twitter," in Proceedings of 22nd European Conference on Artificial Intelligence, The Hague, The Netherlands, 2016, pp. 1158-1166.
  43. N. Alsaedi, P. Burnap, and O. Rana, "Automatic summarization of real world events using Twitter," in Proceedings of the 10th International Conference on Web and Social Media (ICWSM), Cologne, Germany, 2016, pp. 511-514.
  44. R. Belkaroui and R. Faiz, "Conversational based method for tweet contextualization," Vietnam Journal of Computer Science, vol. 4, no. 4, pp. 223-232, 2017. https://doi.org/10.1007/s40595-016-0092-y
  45. F. C. T. Chua and S. Asur, "Automatic summarization of events from social media," in Proceedings of the 7th International Conference on Weblogs and Social Media (ICWSM), Boston, MA, 2013, pp. 81-90.
  46. C. De Maio, G. Fenza, V. Loia, and M. Parente, "Online query-focused twitter summarizer through fuzzy lattice," in Proceedings of 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Istanbul, Turkey, 2015, pp. 1-8.
  47. E. Yulianti, S. Huspi, and M. Sanderson, "Tweet-biased summarization," Journal of the Association for Information Science and Technology, vol. 67, no. 6, pp. 1289-1300, 2016. https://doi.org/10.1002/asi.23496
  48. R. He, Y. Liu, G. Yu, J. Tang, Q. Hu, and J. Dang, "Twitter summarization with social-temporal context," World Wide Web, vol. 20, no. 2, pp. 267-290, 2017. https://doi.org/10.1007/s11280-016-0386-0
  49. A. Olariu, "Efficient online summarization of microblogging streams," in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 2014, pp. 236-240.
  50. J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets. New York, NY: Cambridge University Press, 2011.
  51. K. Rudra, S. Banerjee, N. Ganguly, P. Goyal, M. Imran, and P. Mitra, "Summarizing situational tweets in crisis scenario," in Proceedings of the 27th ACM Conference on Hypertext and Social Media, Halifax, Canada, 2016, pp. 137-147.
  52. K. Heafield, "KenLM: faster and smaller language model queries," in Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, UK, 2011, pp. 187-197.
  53. R. Zhang, W. Li, D. Gao, and Y. Ouyang, "Automatic Twitter topic summarization with speech acts," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 3, pp. 649-658, 2013. https://doi.org/10.1109/TASL.2012.2229984
  54. J. R. Searle, "Indirect speech acts," in Syntax and Semantics 3: Speech Acts. New York, NY: Academic Press, 1975, pp. 59-82.
  55. I. Mani, "Summarization evaluation: an overview," in Proceedings of the 2nd Workshop Meeting on Evaluation of Chinese & Japanese Text Retrieval and Text Summarization (NTCIR-2), Tokyo, Japan, 2001.
  56. G. J. Rath, A. Resnick, and T. R. Savage, "The formation of abstracts by the selection of sentences. Part I. sentence selection by men and machines," Journal of the Association for Information Science and Technology, vol. 12, no. 2, pp. 139-141, 1961.
  57. C. Y. Lin and E. Hovy, "Manual and automatic evaluation of summaries," in Proceedings of the ACL-02 Workshop on Automatic Summarization, Philadelphia, PA, 2002, pp. 45-51.
  58. R. L. Donaway, K. W. Drummey, and L. A. Mather, "A comparison of rankings produced by summarization evaluation measures," in Proceedings of the NAACL-ANLP Workshop on Automatic Summarization, Seattle, WA, 2000, pp. 69-78.
  59. D. R. Radev, H. Jing, and M. Budzikowska, "Summarization of multiple documents: clustering, sentence extraction, and evaluation," in Proceedings of the NAACL-ANLP Workshop on Automatic Summarization, Seattle, WA, 2000, pp. 21-30.
  60. G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York, NY: McGraw-Hill Inc., 1986.
  61. D. Inouye and J. K. Kalita, "Comparing twitter summarization algorithms for multiple post summaries," in Proceedings of IEEE 3rd International Conference on Privacy, Security, Risk and Trust and 2011 IEEE 3rd International Conference on Social Computing, Boston, MA, 2011, pp. 298-306.
  62. C. Y. Lin, "Rouge: a package for automatic evaluation of summaries," in Proceedings of the ACL-04 Workshop, Barcelona, Spain, 2004, pp. 74-81.
  63. C. Y. Lin and E. Hovy, "Automatic evaluation of summaries using N-gram co-occurrence statistics," in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, 2003, pp. 71-78.
  64. C. Y. Lin, G. Cao, J. Gao, and J. Y. Nie, "An information-theoretic approach to automatic evaluation of summaries," in Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, NY, 2006, pp. 463-470.
  65. A. Louis and A. Nenkova, "Automatically assessing machine summary content without a gold standard," Computational Linguistics, vol. 39, no. 2, pp. 267-300, 2013. https://doi.org/10.1162/COLI_a_00123