DOI QR코드

DOI QR Code

Multi-Document Summarization Method Based on Semantic Relationship using VAE

VAE를 이용한 의미적 연결 관계 기반 다중 문서 요약 기법

  • Received : 2017.11.02
  • Accepted : 2017.12.20
  • Published : 2017.12.28

Abstract

As the amount of document data increases, the user needs summarized information to understand the document. However, existing document summary research methods rely on overly simple statistics, so there is insufficient research on multiple document summaries for ambiguity of sentences and meaningful sentence generation. In this paper, we investigate semantic connection and preprocessing process to process unnecessary information. Based on the vocabulary semantic pattern information, we propose a multi-document summarization method that enhances semantic connectivity between sentences using VAE. Using sentence word vectors, we reconstruct sentences after learning from compressed information and attribute discriminators generated as latent variables, and semantic connection processing generates a natural summary sentence. Comparing the proposed method with other document summarization methods showed a fine but improved performance, which proved that semantic sentence generation and connectivity can be increased. In the future, we will study how to extend semantic connections by experimenting with various attribute settings.

Keywords

VAE;Multi-Document Summarization;Natural Language Processing;Deep Learning;Semantic Relationship

References

  1. Jinsu Kim, "Emotion Prediction of Document using Paragraph Analysis", Journal of Digital Convergence, Vol. 12, No. 12, pp.249-255, 2014.
  2. J. Goldstein, V. Mittal, J. Carbonell, & M. Kantrowitz, "Multi-document summarization by sentence extraction", In NAACL-ANLP 2000 Workshop on Automatic summarization, pp. 40-48, 2000.
  3. O. Sornil, K. Gree-ut, "An Automatic Text Summarization Approach using Content-Based and Graph-Based Characteristics", In Proceedings of IEEE Conference on Cybernetics and Intelligent Systems, pp. 1-6, 2006.
  4. Mihalcea, Rada, and P. Tarau, "TextRank : Bringing order into texts," Association for Computational Linguistics, 2004.
  5. D. Wang, S. Zhu, T. Li, & Y. Gong, "Multidocument summarization using sentence-based topic models". In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers: pp.297-300, 2009.
  6. I. Mani, "Automatic Summarization", John Benjamins Publishing Company, pp.114-125, 2001.
  7. X. Wan, J. Yang, "Multi-Document Summarization Using Cluster-based Link Analysis", Proceeding of the International Conference(SIGIR'08), 2008.
  8. Z. Cao, F. Wei, L. Dong, S. Li, & M. Zhou, "Ranking with recursive neural networks and its application to multi-document summarization". In AAAI: pp.2153-2159, 2015.
  9. Won-Chul, Kim "Scalable Multi-document Summarization Using Deep Learning-based Topic Modeling", M.S. thesis, Yonsei University, 2016.
  10. Henning, Leonhard, "Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis", Proceeding of the International Conference RANLP'09, 2009.
  11. J. Y. Chung, K. Kastner, L. Dinh, K. Goel, A. Courville, and Y. Bengio, " A recurrent latent variable model for sequential data", In Neural Information Processing Systems (NIPS), 2015.
  12. D. P. Kingma, M. Welling, "Auto-encoding variational Bayes" In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
  13. S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio, "Generating sentences from a continuous space", arXiv preprint arXiv:1511.06349, 2015.
  14. Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, "Controllable text generation", arXiv preprint arXiv:1703.00955v2, 2017.
  15. Kyung-Ae Kim, Jin-Hee Ku, "A Study on the Change of the View of Love using Text Mining and Sentiment Analysis", Journal of Digital Convergence, Vol. 15, No. 2 , pp.285-294, 2017.
  16. A. Graves, "Generating sequences with recurrent neural networks", In Arxiv preprint arXiv:1308.0850, 2013.
  17. Cheol-Jung Yoo, Yong Kim, Bo-Hyun Yun, "A Study on Utilization of Wikipedia Contents for Automatic Construction of Linguistic Resources", Journal of Digital Convergence, Vol. 13, No. 5 , pp.187-194, 2015.
  18. P. McNamee, J. Mayfield, "Character N-Gram Tokenization for European Language Text Retrieval", Information Retrieval, Vol 7, No. 1-2, pp.73-97, 2004. https://doi.org/10.1023/B:INRT.0000009441.78971.be