DOI QR코드

DOI QR Code

Latent topics-based product reputation mining

잠재 토픽 기반의 제품 평판 마이닝

  • Park, Sang-Min (Department of Software Convergence Engineering, Kunsan National University) ;
  • On, Byung-Won (Department of Software Convergence Engineering, Kunsan National University)
  • 박상민 (군산대학교 산학융합공과대학 소프트웨어융합공학과) ;
  • 온병원 (군산대학교 산학융합공과대학 소프트웨어융합공학과)
  • Received : 2017.03.08
  • Accepted : 2017.05.15
  • Published : 2017.06.30

Abstract

Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

최근 여론조사 분야에서 데이터에 기반을 둔 분석 기법이 널리 활용되고 있다. 기업에서는 최근 출시된 제품에 대한 선호도를 조사하기 위해 기존의 설문조사나 전문가의 의견을 단순 취합하는 것이 아니라, 온라인상에 존재하는 다양한 종류의 데이터를 수집하고 분석하여 제품에 대한 대중의 기호를 정확히 파악할 수 있는 방안을 필요로 한다. 기존의 주요 방안에서는 먼저 해당 분야에 대한 감성사전을 구축한다. 전문가들은 수집된 텍스트 문서들로부터 빈도가 높은 단어들을 정리하여 긍정, 부정, 중립을 판단한다. 특정 제품의 선호를 판별하기 위해, 제품에 대한 사용 후기 글을 수집하여 문장을 추출하고, 감성사전을 이용하여 문장들의 긍정, 부정, 중립을 판단하여 최종적으로 긍정과 부정인 문장의 개수를 통해 제품에 대한 선호도를 측정한다. 그리고 제품에 대한 긍 부정 내용을 자동으로 요약하여 제공한다. 이것은 문장들의 감성점수를 산출하여, 긍정과 부정점수가 높은 문장들을 추출한다. 본 연구에서는 일반 대중이 생산한 문서 속에 숨겨져 있는 토픽을 추출하여 주어진 제품의 선호도를 조사하고, 토픽의 긍 부정 내용을 요약하여 보여주는 제품 평판 마이닝 알고리즘을 제안한다. 기존 방식과 다르게, 토픽을 활용하여 쉽고 빠르게 감성사전을 구축할 수 있으며 추출된 토픽을 정제하여 제품의 선호도와 요약 결과의 정확도를 높인다. 실험을 통해, K5, SM5, 아반떼 등의 국내에서 생산된 자동차의 수많은 후기 글들을 수집하였고, 실험 자동차의 긍 부정 비율, 긍 부정 내용 요약, 통계 검정을 실시하여 제안방안의 효용성을 입증하였다.

Keywords

References

  1. A shopping dictionary, http://terms.naver.com/ entry.nhn?docId=2464217&cid=51399&categ oryId=51399, (Accessed 2017)
  2. Aletras, N. and M. Stevenson, "Evaluating topic coherence using distributional semantics", Proceedings of the 10th International Conference on Computational Semantics (IWCS), Potsdam, Germany, 2013
  3. Blei, D., "Probabilistic topic models," Communications of the ACM, Vol.55, No.4, (2012), 77-84 https://doi.org/10.1145/2133806.2133826
  4. Bobaedream, http://www.bobaedream.co.kr/ (Accessed 2016)
  5. Das, R., M. Zaheer, and C. Dyer, "Gaussian LDA for topic models with word embedding", Proceedings of Conference of the Association for Computational Linguistics (ACL), Beijing, China, 2015
  6. Doopedia, http://terms.naver.com/entry.nhn?docId =1234816&cid=40942&categoryId=32359 (Accessed 2017)
  7. Lee, S.W. et al, HannanumKorean morphological analyzer Version 0.8.4, https://kldp.net/ hannanum/ (Downloaded 24 October, 2016)
  8. Jeong, D.M., J.S. Kim, G.N. Kim, J.W. Hu, B.W. On, and M.J. Kang, "A Proposal of a keyword extraction system for detecting social issues", Journal of Intelligence and Information Systems, Vol.19, No.3, (2013), 1-23 https://doi.org/10.13088/jiis.2013.19.3.001
  9. Jo, T.M. and J.H. Lee, "Latent keyphrase extraction using LDA model", Korea Intelligent Information Systems Society, Vol.25, No.2, (2015), 180-185 https://doi.org/10.5391/JKIIS.2015.25.2.180
  10. Kim, H.C., J.C. Oh, B.I. Yoon and K.M. Jeong, "Analysis of variance", Statistical understanding of Kyungmoon, (2009) 194-209
  11. Kim, M.S., "SNS search engine based on opinion analysis 'ZimGo'', http://news.donga.com/ 3/all/20170114/82372402/ (Accessed 2017)
  12. Kim, S.W. and N.G. Kim, "A Study on the Effect of using sentiment lexicon in opinion classification", Journal of Intelligence and Information Systems, Vol.20, No.1, (2014), 133-148 https://doi.org/10.13088/JIIS.2014.20.1.133
  13. Lee, J.H. and H.G. Lee, "A Study on customer reviews about domestic and imported clothes products through opinion mining", Korea Intelligent Information Systems Society, (2015), 223-234
  14. Liu, B., "Sentiment analysis and opinion mining", Morgan& Claypool Publishers, Vol.5, No.1, (2012), 1-167
  15. On, H.S., "Now, let's go to the polls and cook big data!", Chosub Biz, http://biz.chosun.com/ site/data/html_dir/2015/06/05/2015060501615. html?Dep0=twitter (Accessed 2016)
  16. Phan, X.H. and C.T. Nguyen, JGibbLDA - A Java implementation of Latent Dirichlet Allocation (LDA) Version 1.0, http://jgibblda.source forge.net/ (Downloaded 08 Nevember, 2016)
  17. Qian, S., T. Zhang,, and C. Xu, "Multi-modal multi-view topic-opinion mining for social event analysis", Proceedings of ACM Multimedia Conference (ACMMM), Amsterdam, Netherlands, 2016
  18. Shim, H.M. and , W.J. Kim, "A Study of topic sentiment propensity analysis using big data", Journal of Intelligence and Information Systems, Vol.20, No.20, (2015)
  19. Wagner, C., "Topic models," http://www.slideshare.net/clauwa/topic-models/5274169 (Accessed 2010)
  20. Wan, X. and T. Wang, "Automatic labeling of topic models using text summaries", Proceedings of Conference of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016
  21. Wikipedia, https://wikipedia.org/wiki/wordnet (Accessed 2017)
  22. Zeng, Y., T. Ku, S. Wu, L. Chen, and G. Chen, "Modeling the helpful opinion mining of online consumer reviews as a classification problem", International Journal of Computation Linguistics & Chinese Language Processing, Vol.19, No.2, (2014), 17-31