DOI QR코드

DOI QR Code

Company Name Discrimination in Tweets using Topic Signatures Extracted from News Corpus

  • Hong, Beomseok (Department of Computer and Information Science, Towson University) ;
  • Kim, Yanggon (Department of Computer and Information Science, Towson University) ;
  • Lee, Sang Ho (School of Software, Soongsil University)
  • Received : 2016.11.13
  • Accepted : 2016.12.05
  • Published : 2016.12.30

Abstract

It is impossible for any human being to analyze the more than 500 million tweets that are generated per day. Lexical ambiguities on Twitter make it difficult to retrieve the desired data and relevant topics. Most of the solutions for the word sense disambiguation problem rely on knowledge base systems. Unfortunately, it is expensive and time-consuming to manually create a knowledge base system, resulting in a knowledge acquisition bottleneck. To solve the knowledge-acquisition bottleneck, a topic signature is used to disambiguate words. In this paper, we evaluate the effectiveness of various features of newspapers on the topic signature extraction for word sense discrimination in tweets. Based on our results, topic signatures obtained from a snippet feature exhibit higher accuracy in discriminating company names than those from the article body. We conclude that topic signatures extracted from news articles improve the accuracy of word sense discrimination in the automated analysis of tweets.

Keywords

References

  1. R. K. Miller and K. Washington, The 2013 Entertainment, Media & Advertising Market Research Handbook, 13th ed., Loganville, GA: Richard K Miller & Associates, 2013
  2. W. He, S. Zha, and L. Li, "Social media competitive analysis and text mining: a case study in the pizza industry," International Journal of Information Management, vol. 33, no. 3, pp. 464-472, 2013. https://doi.org/10.1016/j.ijinfomgt.2013.01.001
  3. B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury, "Twitter power: tweets as electronic word of mouth," Journal of the American Society for Information Science and Technology, vol. 60, no. 11, pp. 2169-2188, 2009. https://doi.org/10.1002/asi.21149
  4. R. Krovetz and W. B. Croft, "Lexical ambiguity and information retrieval," ACM Transactions on Information Systems, vol. 10, no. 2, pp. 115-141, 1992. https://doi.org/10.1145/146802.146810
  5. R. Navigli, "Word sense disambiguation: a survey," ACM Computing Surveys, vol. 41, no. 2, pp. 1-69, 2009.
  6. W. A. Gale, K. W. Church, and D. Yarowsky, "A method for disambiguating word senses in a large corpus," Computers and the Humanities, vol. 26, no. 5/6, pp. 415-439, 1992. https://doi.org/10.1007/BF00136984
  7. M. Cuadros and G. Rigau, "Quality assessment of large scale knowledge resources," in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 2006, pp. 534-541.
  8. S. Landes, C. Leacock, and R. I. Tengi, "Building semantic concordances," in WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press, 1998, pp. 199-216.
  9. B. Hong, Y. Han, and Y. Kim. "A semi-supervised tweet classification method using news articles," in Proceedings of the 2015 Conference on Research in Adaptive and Convergent Systems, Prague, Czech Republic, 2015, pp. 62-67.
  10. R. Mihalcea, "Co-training and self-training for word sense disambiguation," in Proceedings of the 8th Conference on Computational Natural Language Learning (CoNLL), Boston, MA, 2004, pp. 33-40.
  11. E. Agirre, O. Ansa, D. Martinez, and E. Hovy, "Enriching WordNet concepts with topic signatures," in Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources, Pittsburgh, PA, 2001, pp. 23-28.
  12. E. Agirre and A. Soroa, "SemEval-2007 task 02: evaluating word sense induction and discrimination systems," in Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech Republic, 2007, pp. 7-12.
  13. E. Amigo, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo, "WePS-3 evaluation campaign: overview of the online reputation management task," in Proceedings of International Conference on Cross-Language Evaluation Forum (CLEF2010), Padua, Italy, 2010.
  14. S. R. Yerva, Z. Miklos, and K. Aberer, "It was easy, when apples and blackberries were only fruits," in Proceedings of International Conference on Cross-Language Evaluation Forum (CLEF2010), Padua, Italy, 2010.
  15. M. Yoshida, S. Matsushima, S. Ono, I. Sato, and H. Nakagawa, "ITC-UT: tweet categorization by query categorization for on-line reputation management," in Proceedings of International Conference on Cross-Language Evaluation Forum (CLEF2010), Padua, Italy, 2010.
  16. C. Y. Lin and E. Hovy, "The automated acquisition of topic signatures for text summarization," in Proceedings of the 18th Conference on Computational Linguistics, Saarbrucken, Germany, 2000, pp. 495-501.
  17. M. Biryukov, R. Angheluta, and M. F. Moens, "Multidocument question answering text summarization using topic signatures," Journal of Digital Information Management, vol. 3, no. 1, pp. 27-33, 2005.
  18. E. Agirre and O. L. de Lacalle, "Publicly available topic signatures for all WordNet nominal senses," in Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, 2004, pp. 1123-1126.