DOI QR코드

DOI QR Code

Joint Hierarchical Semantic Clipping and Sentence Extraction for Document Summarization

  • Yan, Wanying (College of Information Engineering and Automation, Kunming University of Science and Technology) ;
  • Guo, Junjun (College of Information Engineering and Automation, Kunming University of Science and Technology)
  • Received : 2020.03.20
  • Accepted : 2020.05.24
  • Published : 2020.08.31

Abstract

Extractive document summarization aims to select a few sentences while preserving its main information on a given document, but the current extractive methods do not consider the sentence-information repeat problem especially for news document summarization. In view of the importance and redundancy of news text information, in this paper, we propose a neural extractive summarization approach with joint sentence semantic clipping and selection, which can effectively solve the problem of news text summary sentence repetition. Specifically, a hierarchical selective encoding network is constructed for both sentence-level and document-level document representations, and data containing important information is extracted on news text; a sentence extractor strategy is then adopted for joint scoring and redundant information clipping. This way, our model strikes a balance between important information extraction and redundant information filtering. Experimental results on both CNN/Daily Mail dataset and Court Public Opinion News dataset we built are presented to show the effectiveness of our proposed approach in terms of ROUGE metrics, especially for redundant information filtering.

Keywords

References

  1. J. Cheng and M. Lapata, "Neural summarization by extracting sentences and words," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany, 2016.
  2. R. Nallapati, F. Zhai, and B. Zhou, "SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents," in Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, 2017.
  3. Q. Zhou, N. Yang, F. Wei, S. Huang, M. Zhou, and T. Zhao, "Neural document summarization by jointly learning to score and select sentences," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, 2018, pp. 654-663.
  4. M. Gambhir and V. Gupta, "Recent automatic text summarization techniques: a survey," Artificial Intelligence Review, vol. 47, no. 1, pp. 1-66, 2017. https://doi.org/10.1007/s10462-016-9475-9
  5. Z. Cao, W. Li, S. Li, and F. Wei, "Retrieve, rerank and rewrite: soft template based neural summarization," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 2018, pp. 152-161.
  6. S. Narayan, S. B. Cohen, and M. Lapata, "Ranking sentences for extractive summarization with reinforcement learning," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), New Orleans, LO, 2018, pp. 1747-1759.
  7. Q. Zhou, N. Yang, F. Wei, and M. Zhou, "Selective encoding for abstractive sentence summarization," in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, 2017, pp. 1094-1104.
  8. X. Zhang, F. Wei, and M. Zhou, "HIBERT: document level pre-training of hierarchical bidirectional transformers for document summarization," in Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL 2019), Florence, Italy, 2019, pp. 5059-5069.
  9. K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom, "Teaching machines to read and comprehend," Advances in Neural Information Processing Systems, vol. 28, pp. 1693-1701, 2015.