DOI QR코드

DOI QR Code

Multi-Document Summarization Method Based on Semantic Relationship using VAE

VAE를 이용한 의미적 연결 관계 기반 다중 문서 요약 기법

  • Baek, Su-Jin (Dept. of Information Communication, Yong-In Songdam College)
  • 백수진 (용인송담대학교 정보통신학과)
  • Received : 2017.11.02
  • Accepted : 2017.12.20
  • Published : 2017.12.28

Abstract

As the amount of document data increases, the user needs summarized information to understand the document. However, existing document summary research methods rely on overly simple statistics, so there is insufficient research on multiple document summaries for ambiguity of sentences and meaningful sentence generation. In this paper, we investigate semantic connection and preprocessing process to process unnecessary information. Based on the vocabulary semantic pattern information, we propose a multi-document summarization method that enhances semantic connectivity between sentences using VAE. Using sentence word vectors, we reconstruct sentences after learning from compressed information and attribute discriminators generated as latent variables, and semantic connection processing generates a natural summary sentence. Comparing the proposed method with other document summarization methods showed a fine but improved performance, which proved that semantic sentence generation and connectivity can be increased. In the future, we will study how to extend semantic connections by experimenting with various attribute settings.

많은 양의 문서 데이터가 증가됨에 따라 사용자는 해당 문서를 이해하기 위한 요약된 정보를 필요로 한다. 그러나, 기존 문서 요약 연구 방법들은 지나치게 단순한 통계에 의존함으로써 문장의 모호성 및 의미 있는 문장 생성을 위한 다중 문서 요약 연구가 미흡한 실정이다. 본 논문에서는 의미적 연결 관계에 대한 파악 및 불필요한 정보를 처리하기 위한 전처리 과정을 거치며, 어휘 의미 패턴 정보를 기반으로 VAE를 이용하여 문장 간의 의미적 연결성을 높인 다중 문서 요약 기법을 제안하였다. 문장을 이루고 있는 단어 벡터들을 이용하여, 잠재된 변수로 생성된 압축된 정보와 속성 판별기로부터 학습을 한 후 문장을 재구성함으로써 의미적 연결 처리가 자연스러운 요약문을 생성하였다. 제안된 방법과 다른 문서 요약 방법을 비교했을 시 미세하지만 더 향상된 성능을 나타냈으며, 이는 의미적 문장 생성 및 연결성을 높일 수 있음을 증명하였다. 앞으로, 다양한 속성 설정 값을 가지고 실험하여 의미적 연결 관계를 확장할 수 있는 방법을 연구하고자 한다.

Keywords

References

  1. Jinsu Kim, "Emotion Prediction of Document using Paragraph Analysis", Journal of Digital Convergence, Vol. 12, No. 12, pp.249-255, 2014. https://doi.org/10.14400/JDC.2014.12.12.249
  2. J. Goldstein, V. Mittal, J. Carbonell, & M. Kantrowitz, "Multi-document summarization by sentence extraction", In NAACL-ANLP 2000 Workshop on Automatic summarization, pp. 40-48, 2000.
  3. O. Sornil, K. Gree-ut, "An Automatic Text Summarization Approach using Content-Based and Graph-Based Characteristics", In Proceedings of IEEE Conference on Cybernetics and Intelligent Systems, pp. 1-6, 2006.
  4. Mihalcea, Rada, and P. Tarau, "TextRank : Bringing order into texts," Association for Computational Linguistics, 2004.
  5. D. Wang, S. Zhu, T. Li, & Y. Gong, "Multidocument summarization using sentence-based topic models". In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers: pp.297-300, 2009.
  6. I. Mani, "Automatic Summarization", John Benjamins Publishing Company, pp.114-125, 2001.
  7. X. Wan, J. Yang, "Multi-Document Summarization Using Cluster-based Link Analysis", Proceeding of the International Conference(SIGIR'08), 2008.
  8. Z. Cao, F. Wei, L. Dong, S. Li, & M. Zhou, "Ranking with recursive neural networks and its application to multi-document summarization". In AAAI: pp.2153-2159, 2015.
  9. Won-Chul, Kim "Scalable Multi-document Summarization Using Deep Learning-based Topic Modeling", M.S. thesis, Yonsei University, 2016.
  10. Henning, Leonhard, "Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis", Proceeding of the International Conference RANLP'09, 2009.
  11. J. Y. Chung, K. Kastner, L. Dinh, K. Goel, A. Courville, and Y. Bengio, " A recurrent latent variable model for sequential data", In Neural Information Processing Systems (NIPS), 2015.
  12. D. P. Kingma, M. Welling, "Auto-encoding variational Bayes" In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
  13. S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio, "Generating sentences from a continuous space", arXiv preprint arXiv:1511.06349, 2015.
  14. Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, "Controllable text generation", arXiv preprint arXiv:1703.00955v2, 2017.
  15. Kyung-Ae Kim, Jin-Hee Ku, "A Study on the Change of the View of Love using Text Mining and Sentiment Analysis", Journal of Digital Convergence, Vol. 15, No. 2 , pp.285-294, 2017. https://doi.org/10.14400/JDC.2017.15.2.285
  16. A. Graves, "Generating sequences with recurrent neural networks", In Arxiv preprint arXiv:1308.0850, 2013.
  17. Cheol-Jung Yoo, Yong Kim, Bo-Hyun Yun, "A Study on Utilization of Wikipedia Contents for Automatic Construction of Linguistic Resources", Journal of Digital Convergence, Vol. 13, No. 5 , pp.187-194, 2015. https://doi.org/10.14400/JDC.2015.13.5.187
  18. P. McNamee, J. Mayfield, "Character N-Gram Tokenization for European Language Text Retrieval", Information Retrieval, Vol 7, No. 1-2, pp.73-97, 2004. https://doi.org/10.1023/B:INRT.0000009441.78971.be