DOI QR코드

DOI QR Code

Sentiment Analysis Using Deep Learning Model based on Phoneme-level Korean

한글 음소 단위 딥러닝 모형을 이용한 감성분석

  • 이재준 (국민대학교 일반대학원 데이터사이언스학과) ;
  • 권순범 (국민대학교 경영학부) ;
  • 안성만 (국민대학교 경영학부)
  • Received : 2017.12.22
  • Accepted : 2018.03.06
  • Published : 2018.03.31

Abstract

Sentiment analysis is a technique of text mining that extracts feelings of the person who wrote the sentence like movie review. The preliminary researches of sentiment analysis identify sentiments by using the dictionary which contains negative and positive words collected in advance. As researches on deep learning are actively carried out, sentiment analysis using deep learning model with morpheme or word unit has been done. However, this model has disadvantages in that the word dictionary varies according to the domain and the number of morphemes or words gets relatively larger than that of phonemes. Therefore, the size of the dictionary becomes large and the complexity of the model increases accordingly. We construct a sentiment analysis model using recurrent neural network by dividing input data into phoneme-level which is smaller than morpheme-level. To verify the performance, we use 30,000 movie reviews from the Korean biggest portal, Naver. Morpheme-level sentiment analysis model is also implemented and compared. As a result, the phoneme-level sentiment analysis model is superior to that of the morpheme-level, and in particular, the phoneme-level model using LSTM performs better than that of using GRU model. It is expected that Korean text processing based on a phoneme-level model can be applied to various text mining and language models.

Keywords

References

  1. Ahn, S.M., Y.C. Chung, J.J. Lee, and J.H. Yang, "Korean Sentence Generation Using Phoneme- Level LSTM Language Model", Journal of Intellgence and Information, Vol.23, No.2, 2017, 71-88. (안성만, 정여진, 이재준, 양지헌, "한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성", 지능정보연구, 제23권, 제2호, 2017, 71-88.)
  2. Bengio, Y., P. Frasconi, and P. Simard, "The Problem of Learning Long-term Dependencies in Recurrent Networks", IEEE International Conference on Neural Networks, San Francisco, US, 1993, 1183-1188.
  3. Chollet, F., "Keras", 2015, Available at https://github.com/fchollet/keras(Downloaded December 1. 2017).
  4. Chung, J.Y., C. Gulchre, K.H. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling", arXiv preprint arXiv : 1412.3555, 2014.
  5. Dave, K., S. Lawrence, and D.M. Pennok, "Miningthepeanutgallery : Opinion Extraction and Semantic Classification of Product Reviews", Proceedings of the 12th International Conference on World Wide Web, 2003, 519-528.
  6. Feldman, R., "Techniques and Applications for Sentiment Analysis", Communications of the ACM, Vol.56, No.4, 2013, 82-89. https://doi.org/10.1145/2436256.2436274
  7. Goodfellow, I., Y. Bengio, and A. Courvile, Deep Learning, The MIT Press, 2016.
  8. Gwon, S.J., "Sentiment Analysis of Movie Reviews using the Word2vec and RNN", University of Dongkuk, 2017. (권수정, "Word2vec과 RNN을 이용한 영화 리뷰의 감성분석", 동국대학교, 2017.)
  9. Hochreiter, S. and J. Schmidhuber, "Long Short- Term Memory", Neural Computation, Vol. 9, No.8, 1997, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  10. Jozefowicz, R., W. Zaremba, and I. Sutskever, "An Empirical Exploration of Recurrent Network Architectures", ICML'15 Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 2015, 2342-2350.
  11. Kim, Y., "Convolutional Neural Networks for Sentence Classification", arXiv Preprint ar Xiv : 1408.5882, 2014.
  12. Lee, J.J. and S.M. Ahn, "Sentiment Analysis of Deep Learning Model Using Hangul Phoneme Unit : Focused on Translated Movie Review(IMDB)", Proceedings of the Korea Intelligent Information System Society Conference( SCW-2017), 2017, 113-114. (이재준, 안성만, "한글 음소 단위를 적용한 딥러닝 모형의 감성분석 : 번역된 영화리뷰(IMDB)를 중심으로", 한국지능정보시스템학회, 춘계학술 대회(SCW-2017), 2017, 113-114.)
  13. Olah, C., "Understanding LSTM Networks", 2015. Available at http://colah.github.io/posts/2015- 08-Understanding-LSTMs(Accessed December 1. 2017).
  14. Pang, B. and L. Lillian, "Opinion Mining and Sentiment Analysis", Foundations and Trends$^{(R)}$ in Information Retrieval, Vol.2, No.1-2, 2008, 1-135. https://doi.org/10.1561/1500000011
  15. Pang, B. and L. Lillian, "Sentiment Analysis and Subjectivity", Handbook of Natural Language Processing, 2010.
  16. Pang, B., L. Lillian, and V. Shivakumar, "Thumbs up? : Sentiment Classification Using Machine Learning Techniques", Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol. 10, 2002, 79-86. doi : 10.3115/1118693.1118704 (Download February 27, 2017).
  17. Socher, R., A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A. Ng, and C. Potts, "Recursive Deep Models for Semantic Compositionality over A Sentiment Treebank", Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, US, 2013, 1631-1642.
  18. Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutslever, and R. Salakhutdinov, "Dropout : A simple Way to Prevent Neural Networks from Overfitting", Journal of Machine Learning Research, Vol.15, No.1, 2014, 1929- 1958.
  19. Taboada, M., J. Brooke, M. Tofiloski, K. Voll, and M. Stede, "Lexicon-based Methods for Sentiment Analysis", Computational Linguistics, Vol.37, No.2, 2011, 267-307. https://doi.org/10.1162/COLI_a_00049
  20. Turney, P.D., "Thumbs up or Thumbs Down? : Semantic Orientation Applied to Unsupervised Classification of Reviews", Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, US, 2002, 417-424.
  21. Yeon, J.H., D.J. Lee, J.H. Shim, and S.G. Lee, "Product Review Data and Sentiment Analytical Processing Modeling", The Journal of Society for e-Business Studies, Vol.16, No. 4, 2011, 125-137. (연종흠, 이동주, 심준호, 이상구, "상품 리뷰 데이터와 감성 분석 처리 모델링", 한국전자거래학회지, 제16권, 제4호, 2011, 125-137.)
  22. Zhu, F. and X. Zhang, "Impact of Online Consumer Reviews on Sales : The Moderating Role of Product and Consumer Characteristics", Journal of Marketing, Vol.74, No.2, 2010, 133-148. https://doi.org/10.1509/jmkg.74.2.133