DOI QR코드

DOI QR Code

기계학습을 이용한 단문 오피니언 문서의 효율적 검색 기법

Efficient Retrieval of Short Opinion Documents Using Learning to Rank

  • 장재영 (한성대학교 컴퓨터공학과)
  • 투고 : 2013.05.14
  • 심사 : 2013.08.16
  • 발행 : 2013.08.31

초록

최근 들어 트위터나 페이스북과 같은 SNS가 대중화되면서, 오피니언 마이닝에 관한 연구가 활발히 진행되고 있다. 그러나 현재의 오피니언 마이닝 연구는 대부분 감성분류나 특징선택 방법에 중점을 두고 있으며, 오피니언 문서의 검색에 관한 연구는 아직 미진한 실정이다. 본 논문에서는 단문으로 구성된 오피니언 문서로부터 사용자가 원하는 문서들을 효율적으로 검색하는 기법을 제안한다. 제안된 방법에서는 기존의 감성분류 방법을 활용함과 동시에 문서의 질적 평가를 위해 여러 가지 특징들을 적용한다. 검색 모델을 생성하기 위해 기계학습 기반 랭킹 기법을 활용하며, 감성 분류 모델을 기계학습 랭킹 모델에 통합하는 방법을 사용한다. 또한 실험을 통하여 제안된 방법이 오피니언 검색에 효율적으로 적용될 수 있음을 보여준다.

Recently, as Social Network Services(SNS), such as Twitter, Facebook, are becoming more popular, much research has been doing on opinion mining. However, current related researches are mostly focused on sentiment classification or feature selection, but there were few studies about opinion document retrieval. In this paper, we propose a new retrieval method of short opinion documents. Proposed method utilizes previous sentiment classification methodology, and applies several features of documents for evaluating the quality of the opinion documents. For generating the retrieval model, we adopt Learning-to-rank technique and integrate sentiment classification model to Learning-to-rank. Experimental results show that proposed method can be applied successfully in opinion search.

키워드

참고문헌

  1. H. Kim and J. Chang, "Improving Naive Bayes Text Classifiers with Incremental Feature Weighting", Journal of Korea Information Processing Society, Vol. No. 5, pp.457-464, 2008.
  2. J. Chang and I. Kim, "An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency", Journal of the Institute of Internet, Broadcas ting and Communication, Vol. 12, No. 5, 2012.
  3. J. Kim, S. Lee, and H. Yong, "Automatic Classification Scheme of Opinions Written in Korean ", Journal of KIISE: Database, Vol. 38, No. 6, 2011.
  4. R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), ACM, 2011.
  5. http://lucene.apache.org/nutch/
  6. R. Nagmoti and M. D. Cock, "Ranking Approach for Microblog Search", Proceedings of WI-IAT conference, 2010.
  7. A. Sarma, At. Sarma, S. Gollapudi, and R. Panigrahy, "Ranking Mechanisms in Twitter-like Forums", Proceedings of WSDM conference Feb. 2010.
  8. H. W. Lauw, A. Ntoulas, and K. Kenthapadi, "Estimating the Quality of Postings in the Real-time Web", Proceedings of SSM conference, 2010.
  9. B. Liu , M. Hu, and J. Cheng, "Opinion observer: analyzing and comparing opinions on the Web", Proceedings of the 14th international conference on WWW, pp. 10-14, 2005.
  10. C. Scaffidi, K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin, "Red Opal: Product-Feature Scoring from Reviews", Proceedings of the 8th ACM conference on Electronic commerce, pp. 11-15, 2007.
  11. Xiaowen Ding and Bing Lui, "The Utility of Linguistic Rules in Opinion Mining", Proceedings of SIGIR 2007, pp. 811-812, 2007.
  12. E. Courses and T. Surveys, "Using SentiWordNet for multilingual sentiment analysis", Proceedings of Data Engineering Workshop, 2008.
  13. Q. Miao, Q. Li, and R. Dai, "A sentiment mining and retrieval system", Expert Systems with Applications, Vol.36, pp. 7192-7198, 2009. https://doi.org/10.1016/j.eswa.2008.09.035
  14. T. Liu, Learning to Rank for Information Retrieval, now Publisher Inc. 2009.
  15. T. Joachims, "Optimizing Search Engines using Clickthrough Data", Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, 2003
  16. H. Yu, Y. Kim, and S. Hwang, "RV-SVM: An Efficient Method for Learning Ranking SVM", Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer Science, Volume 5476, pp 426-438, 2009.
  17. http://en.wikipedia.org/wiki/PageRank
  18. X. Huang and W. B. Crott, "A Unified Relevance Model for Opinion Retrieval", Proceedings of CIKM '09, 2009.
  19. B. Li, L. Zhou, Shi Feng, and K. Wong, "A Unified Graph Model for Sentence-based Opinion Retrieval", Proceedings of 48th Annual Meeting of the Association for Computational Linguistics, pp. 1367-1375, 2010.
  20. W. Zhang, C. Yu, and W. Meng, "Opinion Retrieval from Blogs", Proceedings of CIKM '07, 2007.
  21. Y. Yang and J. Pedersen, "A comparative study on feature selection in text categorization," Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412-420, 1997.
  22. C. Park, D. Seong, K. Lee, "Automatic IPC Classification for Patent Documents using Machine Learning", Journal of Korean Institute of Information Technology, Vol. 10, No. 4, 2011.
  23. J. Shim, H. C. Lee, "The Development of Automatic Ontology Generation System Using Extended Search Keywords" Journal of the Korea Academia-Industrial cooperation Society, Vol. 11, no. 6, 2009.

피인용 문헌

  1. A Clustering Scheme Considering the Structural Similarity of Metadata in Smartphone Sensing System vol.14, pp.6, 2014, https://doi.org/10.7236/JIIBC.2014.14.6.229
  2. A Action-based Heuristics for Effective Planning vol.16, pp.9, 2015, https://doi.org/10.5762/KAIS.2015.16.9.6290