DOI QR코드

DOI QR Code

Automatic Retrieval of SNS Opinion Document Using Machine Learning Technique

기계학습을 이용한 SNS 오피니언 문서의 자동추출기법

  • 장재영 (한성대학교 컴퓨터공학과)
  • Received : 2013.08.26
  • Accepted : 2013.10.11
  • Published : 2013.10.31

Abstract

Recently, as Social Network Services(SNS) are becoming more popular, much research has been doing on analyzing public opinions from SNS. One of the most important tasks for solving such a problem is to separate opinion(subjective) documents from others(e.g. objective documents) in SNS. In this paper, we propose a new method of retrieving the opinion documents from Twitter. The reason why it is not easy to search or classify the opinion documents in Twitter is due to a lack of publicly available Twitter documents for training. To tackle the problem, at first, we build a machine-learned model for sentiment classification using the external documents similar to Twitter, and then modify the model to separate the opinion documents from Twitter. Experimental results show that proposed method can be applied successfully in opinion classification.

최근 들어 SNS가 대중화됨에 따라, 이들로 부터 오피니언을 분석하여 특정 이슈에 대한 여론을 파악하려는 다양한 연구가 진행되고 있다. SNS 환경에서 오피니언 분석을 위해서는 우선 게시글 중에서 오피니언 문서와 그렇지 않은 문서(객관적 문서)를 분리해야한다. 본 논문에서는 트위터 문서로 부터 오피니언 문서만을 추출하는 새로운 방법을 제안한다. 트위터 환경에서 오피니언 문서에 대한 분류나 검색의 어려운 점은 충분한 학습 자료가 존재하지 않다는데 있다 이를 위해 제안된 방법에서는 감성 분류를 위해 트위터와 유사한 외부의 정보를 이용하여 기계학습기반 분류 모델을 생성하고, 이를 응용하여 트위터에서의 오피니언 문서 추출에 적용하였다. 또한 실험을 통하여 제안된 방법의 적용 가능성을 평가하였다.

Keywords

References

  1. R. Nagmoti and M. D. Cock, "Ranking Approach for Microblog Search", Proceedings of WI-IAT conference, 2010.
  2. A. Sarma, At. Sarma, S. Gollapudi, and R. Panigrahy, "Ranking Mechanisms in Twitter-like Forums", Proceedings of WSDM conference Feb. 2010.
  3. H. W. Lauw, A. Ntoulas, and K. Kenthapadi, "Estimating the Quality of Postings in the Real-time Web", Proceedings of SSM conference, 2010.
  4. R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), ACM, 2011.
  5. E. Courses and T. Surveys, "Using SentiWordNet for multilingual sentiment analysis", Proceedings of Data Engineering Workshop, 2008.
  6. Q. Miao, Q. Li, and R. Dai, "A sentiment mining and retrieval system", Expert Systems with Applications, Vol.36, pp. 7192-7198, 2009. https://doi.org/10.1016/j.eswa.2008.09.035
  7. P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison-Sesley, 2006
  8. I. Katakis, G. Tsoumakas, and I. Vlahavas, "Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams," Proceedings of ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams, 2006.
  9. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
  10. J. Chang, S. Lee, and J. Han, "Machine-Learned Classification Technique for Opinion Documents Retrieval in Social Network Services", Proceedings of 2013 Korea Computer Congress, 2013.
  11. J. Chang, "An Evaluation of Twitter Ranking Using the Retweet Information", Journal of Korea Society for E-Business Studies, Vol. 17, No. 2, 2012.
  12. X. Huang and W. B. Crott, "A Unified Relevance Model for Opinion Retrieval", Proceedings of CIKM '09, 2009.
  13. B. Li, L. Zhou, Shi Feng, and K. Wong, "An efficient approach for sentence-based opinion retrieval", Proceedings of 48th Annual Meeting of the Association for Computational Linguistics, pp. 1367-1375, 2010.
  14. W. Zhang, C. Yu, and W. Meng, "Opinion Retrieval from Blogs", Proceedings of CIKM '07, 2007.
  15. J. Chang, "Efficient Retrieval of Short Opinion Documents Using Learning to Rank", Journal of the Institute of Internet, Broadcasting and Communication, Vol. 13, No. 4, Aug., 2013.
  16. A. Go, R. Bhayani, and L. Huang, "Twitter Sentiment Classification using Distant Supervision", CS224N Project Report, Stanford, 2009.
  17. H. Kim, and J. Chang, "Improving Naive Bayes Text Classifiers with Incremental Feature Weighting", Journal of Korea Information Processing Society, Vol. 15-B, No. 5, 2008.
  18. J. Chang, and H. Kim, "Accelerating the EM Algorithm through Selective Sampling for Naive Bayes Text Classifier", Journal of Korea Information Processing Society, Vol. 13-D, No. 3, 2006.
  19. T. Joachims, "Making large-Scale SVM Learning Practical. Advances in Kernel Methods", Support Vector Learning, B. Scholkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999.
  20. M. Hwang, D. Choi, and P. Kim "A Context Information Extraction Method according to Subject for Semantic Text Processing", Journal of Korean Institute of Information Technology, vol. 8, No. 11, pp. 197-204, 2010.
  21. J. Shim, H. C. Lee, "The Development of Automatic Ontology Generation System Using Extended Search Keywords" Journal of the Korea Academia-Industrial Cooperation Society, Vol. 11, no. 6, 2009.

Cited by

  1. Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia vol.14, pp.5, 2014, https://doi.org/10.7236/JIIBC.2014.14.5.189
  2. Design and Implementation of Marketing Advisement System through the Concern Degree Analysis of Customers Based on Twitter vol.14, pp.3, 2014, https://doi.org/10.7236/JIIBC.2014.14.3.185