DOI QR코드

DOI QR Code

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining

텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구

  • Kim, Joo Young (Dept. of Industrial and Information Systems Engineering, Soongsil University) ;
  • Kim, Dong soo (Dept. of Industrial and Information Systems Engineering, Soongsil University)
  • Received : 2016.04.18
  • Accepted : 2016.05.23
  • Published : 2016.05.31

Abstract

In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

개방, 공유, 참여를 특징으로 하는 웹 2.0 시대로 들어서면서 인터넷 사용자들의 데이터 생산 및 공유가 쉬워졌다. 이에 따른 데이터의 기하급수적인 증가와 함께 디지털 정보의 대부분인 비정형적 데이터(Unstructured Data)의 양도 증가하고 있다. 인터넷에서 정해진 형식 없이 자연어 형태로 만들어진 비정형 데이터 중, 특정 상품들에 대해 개인이 평가한 리뷰들은 해당 기업이나 해당 상품에 관심이 있는 잠재적 고객에게 필요한 데이터이다. 많은 양의 리뷰 데이터에서 상품에 대한 유용한 정보를 얻기 위해서는 데이터 수집, 저장, 전처리, 분석, 및 결론 도출의 과정이 필요하다. 따라서 본 연구는 R을 이용한 텍스트 마이닝(Text Mining) 기법을 사용하여 텍스트 형식의 비정형 데이터에서 자연어 처리 기술 및 문서 처리 기술을 적용하여 정형화된 데이터 값을 도출하는 방법에 대해 소개한다. 또한, 도출된 정형화된 리뷰 정보를 데이터 마이닝 기법에 적용하여 목적에 맞게 맞춤화된 리뷰 정보를 도출시키는 방안을 제시하고자 한다.

Keywords

References

  1. Archak, N., Ghose, A., and Ipeirotis, P. G., "Deriving the pricing power of product features by mining consumer reviews," Management Science, Vol. 57, No. 8, pp. 1485-1509, 2011. https://doi.org/10.1287/mnsc.1110.1370
  2. Baars, H. and Kemper, H.-G., "Management support with structured and unstructured data-an integrated business intelligence framework," Information Systems Management, Vol. 25, No. 2, pp. 132-148, 2008. https://doi.org/10.1080/10580530801941058
  3. Blumberg, R. and Atre, S., "The problem with unstructured data," DM Review Magazine, 2003.
  4. Buneman, P., "Semistructured data," Proceedings of the sixteenth ACM SIGACTSIGMOD-SIGART symposium on Principles of database systems, ACM, 1997.
  5. Chevalier, J. A. and Mayzlin, D., "The effect of word of mouth on sales: Online book reviews," Journal of marketing research, Vol. 43, No. 3, pp. 345-354, 2006. https://doi.org/10.1509/jmkr.43.3.345
  6. Collins, M., Head-driven statistical models for natural language parsing, Computational linguistics, Vol. 29, No. 4, pp. 589-637, 2003. https://doi.org/10.1162/089120103322753356
  7. Decker, R. and Trusov, M., "Estimating aggregate consumer preferences from online product reviews," International Journal of Research in Marketing, Vol. 27, No. 4, pp. 293-307, 2010. https://doi.org/10.1016/j.ijresmar.2010.09.001
  8. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., "From data mining to knowledge discovery in databases," AI magazine, Vol. 17, No. 3, pp. 37-54, 1996.
  9. Holton, C., "Identifying disgruntled employee systems fraud risk through text mining: A simple solution for a multi-billion dollar problem," Decision Support Systems, Vol. 46, No. 4, pp. 853-864, 2009. https://doi.org/10.1016/j.dss.2008.11.013
  10. Kangale, A., Kumara, S. K., Naeema, M. A., Williamsb, M., and Tiwaria, M. K., "Mining consumer reviews to generate ratings of different product attributes while producing feature-based review-summary," International Journal of Systems Science, Vol. 47, No. 13, pp. 1-15, 2016. https://doi.org/10.1080/00207721.2015.1018374
  11. Kozinets, R. V., de Valck, K., Wojnicki, A. C., and Wilner, S. J. S., "Networked narratives: Understanding word-of-mouth marketing in online communities," Journal of marketing, Vol. 74, No. 2, pp. 71-89, 2010. https://doi.org/10.1509/jmkg.74.2.71
  12. Lee, J., "How eWOM Reduces Uncertainties in Decision-making Process: Using the Concept of Entropy in Information Theory," The Journal of Society for e-Business, Vol. 16, No. 4, pp. 241-256, 2011. https://doi.org/10.7838/jsebs.2011.16.4.241
  13. Mangold, C., "A survey and classification of semantic search approaches," International Journal of Metadata, Semantics and Ontologies, Vol. 2, No. 1, pp. 23-34, 2007. https://doi.org/10.1504/IJMSO.2007.015073
  14. Mayer-Schonberger, V. and Cukier, K., Big data: A revolution that will transform how we live, work, and think., Houghton Mifflin Harcourt, 2013.
  15. McAfee, A. and Brynjolfsson, E., "Big data," The management revolution, Harvard Bus Rev, Vol. 90, No. 10, pp. 61-67, 2012.
  16. Mei, Q. and Zhai, C. X., "Discovering evolutionary theme patterns from text: an exploration of temporal text mining," Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, 2005.
  17. O'reilly, T., "What is Web 2.0: Design patterns and business models for the next generation of software," Communications and strategies, No. 1, p. 17, 2007.
  18. Tan, A.-H., "Text mining: The state of the art and the challenges," Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, pp. 65-70, 1999.
  19. Washio, T. and H. Motoda., "State of the art of graph-based data mining," Acm Sigkdd Explorations Newsletter, Vol. 5, No. 1, pp. 59-68, 2003. https://doi.org/10.1145/959242.959249
  20. Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., and Malone, T. W., "Evidence for a collective intelligence factor in the performance of human groups," science, Vol. 330, No. 6004, pp. 686-688, 2010. https://doi.org/10.1126/science.1193147

Cited by

  1. Analyzing and visualizing comprehensive and personalized online product reviews pp.1573-7543, 2019, https://doi.org/10.1007/s10586-018-2645-6
  2. 소셜 네트워크 분석을 활용한 항공서비스 품질 비교 vol.42, pp.3, 2016, https://doi.org/10.11627/jkise.2019.42.3.116
  3. Analysis of Environmental Management Characteristics Using Network Analysis of CEO Communication in the Automotive Industry vol.13, pp.21, 2016, https://doi.org/10.3390/su132111987