DOI QR코드

DOI QR Code

Topic Modeling with Deep Learning-based Sentiment Filters

감정 딥러닝 필터를 활용한 토픽 모델링 방법론

  • 최병설 (국민대학교 비즈니스 IT전문대학원) ;
  • 김남규 (국민대학교 경영정보학부)
  • Received : 2019.12.02
  • Accepted : 2019.12.18
  • Published : 2019.12.31

Abstract

Purpose The purpose of this study is to propose a methodology to derive positive keywords and negative keywords through deep learning to classify reviews into positive reviews and negative ones, and then refine the results of topic modeling using these keywords. Design/methodology/approach In this study, we extracted topic keywords by performing LDA-based topic modeling. At the same time, we performed attention-based deep learning to identify positive and negative keywords. Finally, we refined the topic keywords using these keywords as filters. Findings We collected and analyzed about 6,000 English reviews of Gyeongbokgung, a representative tourist attraction in Korea, from Tripadvisor, a representative travel site. Experimental results show that the proposed methodology properly identifies positive and negative keywords describing major topics.

References

  1. 김건, 윤혜정, "토픽모델링을 활용한 서울지역 호텔서비스에 대한 고객인식의 변화 분석," 서비스경영학회지, 제17권 제3호, 2016, pp. 217-231.
  2. 남승주, 이현철, "LDA 토픽 모델링을 활용한 항공승객 유형 별 특성 분석," 경영과학, 제36권 제3호, 2019, pp. 67-85.
  3. 이륜경, 정남호, 홍태호. "딥러닝을 이용한 온라인 리뷰 기반 다속성별 추천 모형 개발," 정보시스템연구 제28권 제1호, 2019, pp. 97-114.
  4. 이시환, 조아람, 이훈영, "온라인 병원 리뷰자료의 Latent Dirichlet Allocation 분석을 활용한 의료서비스 만족 요인에 관한 연구," 서비스경영학회지, 제18권 제5호, 2017, pp. 23-44.
  5. 이종화, 이문봉, 김종원. "TF-IDF 를 활용한 한글 자연어 처리 연구," 정보시스템연구 제28권 제3호, 2019, pp. 105-121.
  6. 이현주, "빅데이터를 활용한 경복궁 방문 경험 분석," 대한관광경영학회지, 제32권 제2호, 2017, pp. 297-318.
  7. Andrzejewski, D., and Zhu. X., "Latent dirichlet allocation with topic-in-set knowledge," Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, Association for Computational Linguistics, 2009, pp. 43-48.
  8. Bahdanau, D., Cho, K., and Bengio, Y.. "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
  9. Blei, D. M., Ng, A. Y., and Jordan, M. I., "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol. 3, 2003, pp. 993-1022.
  10. Blei, D. M., and Lafferty, J.D., "Dynamic Topic Models," In Proceedings of the 23rd International Conference on Machine Learning, June 2006, pp. 113-120.
  11. Cao, J., Xia, T., Li, J., Zhang, Y., and Tang, S., "A density-based method for adaptive LDA model selection," Neurocomputing, Vol. 72, No. 7, 2009, pp. 1775-1781. https://doi.org/10.1016/j.neucom.2008.06.011
  12. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., and Blei, D. M., "Reading tea leaves: How humans interpret topic models," In Advances in neural information processing systems, 2009, pp. 288-296.
  13. Cho, K., Van Merrienboer, B., Bahdanau, D., and Bengio, Y., "On the properties of neural machine translation: Encoder-decoder approaches," arXiv preprint arXiv:1409.1259, 2014.
  14. Cho, K., Courville, A., and Bengio, Y., "Describing multimedia content using attention-based encoder-decoder networks," IEEE Transactions on Multimedia, Vol. 17, No. 11, 2015, pp. 1875-1886. https://doi.org/10.1109/TMM.2015.2477044
  15. Cui, G., Lui, H. K., and Guo, X., "The effect of online consumer reviews on new product sales," International Journal of Electronic Commerce, Vol. 17, No. 1, 2012, pp. 39-58. https://doi.org/10.2753/JEC1086-4415170102
  16. Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
  17. Fang, A., Macdonald, C., Ounis, I., and Habel, P, "Using word embedding to evaluate the coherence of topics from twitter data," In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, July 2016, pp. 1057-1060.
  18. Greene, D., D.O. Callaghan, and P. Cunnungham, "How Many Topics? Stability Analysis for Topic Models," ECMLPKDD'14 Proceedings of the 2014th European Conference on Machine Learning and Knowledge Discovery in Databases-Volume Part I, 2014, pp. 498-513.
  19. Hochreiter S., and Schmidhuber J., "Long short-term memory," Neural computation, Vol. 15, No.9, November 1997, pp. 1735-80.
  20. Landauer, T. K., Foltz, P. W., and Laham, D., "An introduction to latent semantic analysis," Discourse processes, Vol. 25, No. 2, 1998, pp. 259-284. https://doi.org/10.1080/01638539809545028
  21. Liu, B., Dai, Y., Li, X., Lee, W. S., and Yu, P. S., "Building Text Classifiers Using Positive and Unlabeled Examples," Proceedings of the 3rd IEEE International Conference on Data Mining, 2003, pp. 179-188.
  22. Kim, Y., "Convolutional Neural Networks for Sentence Classification," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), arXiv preprint arXiv:1408.5882, 2014.
  23. Moghaddam S., and Ester M., "On the design of LDA models for aspect-based opinion mining," InProceedings of the 21st ACM international conference on Information and knowledge management, October 2012, pp. 803-812.
  24. Newman, D., Lau, J. H., Grieser, K., and Baldwin, T., "Automatic evaluation of topic coherence," In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, June 2010, pp. 100-108.
  25. Rubin, T. N., Chambers, A., Smyth, P., and Steyvers, M., "Statistical topic models for multi-label document classification," Machine learning, Vol. 88, No. 1, 2012, pp. 157-208. https://doi.org/10.1007/s10994-011-5272-5
  26. Salton, G., Wong, A., and Yang, C. S., "A vector space model for automatic indexing," Communications of the ACM, Vol. 18, No. 11, 1975, pp. 613-620. https://doi.org/10.1145/361219.361220
  27. Tasci, S., and Gungor, T., "LDA-based keyword selection in text categorization," In 2009 24th International Symposium on Computer and Information Sciences, September 2009, pp. 230-235.
  28. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... and Polosukhin, I. "Attention is all you need," In Advances in neural information processing systems, 2017, pp. 5998-6008.
  29. Vineet, M., R.S. Caceres, and K.M. Carter, "Evaluating Topic Quality Using Model Clustering," 2014 IEEE Symposium on Computational Intelligence and Data Mining, 2014, pp. 178-185.
  30. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E., "Hierarchical attention networks for document classification," In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, June 2016, pp. 1480-1489.
  31. Zaremba, W., Sutskever, I. and Vinyals, O., "Recurrent neural network regularization," arXiv preprint arXiv: 1409.2329, 2014.
  32. Zhao, W., Chen, J. J., Perkins, R., Liu, Z., Ge, W., Ding, Y., and Zou, W., "A heuristic approach to determine an appropriate number of topics in topic modeling," BMC bioinformatics, Vol. 16, No. 13, December 2015, Available : https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-16-S13-S8/.