DOI QR코드

DOI QR Code

Unsupervised Abstractive Summarization Method that Suitable for Documents with Flows

흐름이 있는 문서에 적합한 비지도학습 추상 요약 방법

  • Received : 2021.10.05
  • Accepted : 2021.10.27
  • Published : 2021.11.30

Abstract

Recently, a breakthrough has been made in the NLP area by Transformer techniques based on encoder-decoder. However, this only can be used in mainstream languages where millions of dataset are well-equipped, such as English and Chinese, and there is a limitation that it cannot be used in non-mainstream languages where dataset are not established. In addition, there is a deflection problem that focuses on the beginning of the document in mechanical summarization. Therefore, these methods are not suitable for documents with flows such as fairy tales and novels. In this paper, we propose a hybrid summarization method that does not require a dataset and improves the deflection problem using GAN with two adaptive discriminators. We evaluate our model on the CNN/Daily Mail dataset to verify an objective validity. Also, we proved that the model has valid performance in Korean, one of the non-mainstream languages.

최근 Encoder-Decoder를 기반한 요약은 거의 인간 수준에 도달하였다. 하지만 이는 영어, 중국어 등 수백만 건의 데이터세트가 잘 갖추어진 주류 언어권에서만 활용 가능하며 데이터세트가 구축되지 않은 비주류 언어권에서는 활용하지 못하는 한계가 있다. 또한, 문서의 일부 영역에 초점 하여 요약하는 편향의 문제를 갖고 있어 동화나 소설과 같이 흐름이 있는 문서에는 적합하지 않다. 본 논문에서는 두 개의 Discriminator가 있는 GAN을 통해 비지도 학습 기반의 추상 요약을 하며, 가이드 토큰의 추출과 주입을 통해 편향 문제를 개선하는 추출 요약과 추상 요약을 혼합한 하이브리드 요약 방법을 제안한다. CNN/Daily Mail 데이터세트를 통해 모델을 평가하여 객관적인 타당성을 검증하고 비주류 언어 중 하나인 한국어에서도 유효한 성능을 보인다는 것을 입증한다.

Keywords

References

  1. Y. Hitomi, et al., "A large-scale multi-length headline corpus for analyzing length-constrained headline generation model evaluation," in Proceedings of the 12th International Conference on Natural Language Generation, pp.333-343, 2019.
  2. Ian J. Goodfellow, et al., "Generative adversarial nets," in Proceedings of the International Conference on Neural Information Processing Systems, pp.2672-2680, 2014.
  3. K. Knight and D. Marcu, "Summarization beyond sentence extraction: A probabilistic approach to sentence compression," Artificial Intelligence, Vol.139, No.1, pp.91-107, 2002. https://doi.org/10.1016/S0004-3702(02)00222-9
  4. B. Dorr, D. Zajic, and R. Schwartz, "Hedge trimmer: A parseand-trim approach to headline generation," In Proceedings of North American Chapter of the Association for Computational Linguistics, pp.1-8, 2003.
  5. R. Mihalcea and P. Tarau, "TextRank: Bringing order into texts," in Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.404-411, 2004.
  6. G. Erkan and D. R. Radev, "LexRank: Graph-based lexical centrality as salience in text summarization," Journal of Artificial Intelligence Research, Vol.22, No.1, pp.457-479, 2004. https://doi.org/10.1613/jair.1523
  7. S. T. Dumais, "Latent semantic analysis," Annual Review of Information Science and Technology, Vol.38, No.1, pp.188-230, 2005. https://doi.org/10.1002/aris.1440380105
  8. A. Haghighi and L. Vanderwende, "Exploring content models for multi-document summarization," in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp.362-370, 2009.
  9. Q. Zhou, N. Yang, F. Wei, S. Huang, M. Zhou, and T. Zhao, "Neural document summarization by jointly learning to score and select sentences," In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Vol.1, pp.654-663, 2018.
  10. M. Zhong, D. Wang, P. Liu, X. Qiu, and X. Huang, "A closer look at data bias in neural extractive summarization models," in Proceedings of the 2nd Workshop on New Frontiers in Summarization, pp.80-89, 2019.
  11. D. Wang, P. Liu, M. Zhong, J. Fu, X. Qiu, and X. Huang, "Exploring domain shift in extractive text summarization," arXiv preprint arXiv:1908.11664, 2019.
  12. T. Cohn and M. Lapata, "Sentence compression beyond word deletion," in Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp.137-144, 2008.
  13. I. Sutskever, O. Vinyals, Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in Neural Information Processing Systems, pp.3104-3112, 2014.
  14. A. Vaswani, et al., "Attention is all you need," in Advances in Neural Information Processing Systems, pp.5998-6008, 2017.
  15. R. Nallapati, F. Zhai, and B. Zhou, "Summarunner: A recurrent neural network based sequence model for extractive summarization of documents," in Thirty-First Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, 2017.
  16. A. Rush, S. Chopra, and J. Weston, "A neural attention model for abstractive sentence summarization," arXiv preprint arXiv:1509.00685, 2015.
  17. S. Chopra, M. Auli, and A. Rush, "Abstractive sentence summarization with attentive recurrent neural networks," in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.93-98, 2016.
  18. R. Nallapati, B. Zhou, C. N. Santos, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequenceto-sequence rnns and beyond," arXiv preprint arXiv:1602.06023, 2016.
  19. R. Paulus, C. Xiong, and R. Socher, "A deep reinforced model for abstractive summarization," arXiv preprint arXiv:1705.04304, 2017.
  20. A. Fan, D. Grangier, and M. Auli, "Controllable abstractive summarization," arXiv preprint arXiv:1711.05217, 2017.
  21. C. Baziotis, I. Androutsopoulos, I. Konstas, and A. Potamianos, "Seqˆ3: Differentiable sequence-to-sequence-to-sequence autoencoder for unsupervised abstractive sentence compression," arXiv preprint arXiv:1904.03651, 2019.
  22. A. See, P. J. Liu, and C. D. Manning, "Get to the point: Summarization with pointer-generator networks," arXiv preprint arXiv:1704.04368, 2017.
  23. G. H. Lee, Y. H. Park, and K. J. Lee, "Building a Korean text summarization dataset using news articles of social media," KIPS Transactions on Software and Data Engineering, Vol.9, No.8, pp.251-258, 2020. https://doi.org/10.3745/KTSDE.2020.9.8.251
  24. S. H. Yoon, A. Y. Kim, and S. B. Park, "Topic Centric Korean Text Summarization using Attribute Model," Korean Institute of Information Scientists and Engineers, Vol.48, No.6, pp.688-695, 2021.
  25. Y. Jung, H. Hwang, and C. Lee, "Korean Text Summarization using MASS with Relative Position Representation," Korean Institute of Information Scientists and Engineers, Vol.47, No.9, pp.873-878, 2020.
  26. L. Yu, W. Zhang, J. Wang, and Y. Yu, "Sequence Generative Adversarial Nets with Policy Gradient," arXiv preprint arXiv:1609.05473, 2016.
  27. K. Lin, and D. Li, "Adversarial Ranking for Language Generation," arXiv preprint arXiv:1705.11001, 2017.
  28. J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. Wang, "Long Text Generation via Adversarial Training with Leaked Information," in Proceedings of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, Vol.32, No.1, 2018.
  29. L. Liu, Y. Lu, M. Yang, Q. Qu, J. Zhu, and H. Li, "Generative Adversarial Network for Abstractive Text Summarization," in Thirty-second Association for the Advancement of Artificial Intelligence conference on artificial intelligence, 2018.
  30. Wang, Y. Shian, and H. Y. Lee., "Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks," arXiv preprint arXiv:1810.02851, 2018.
  31. R. Bhargavaa, G. Sharmaa, and Y. Sharmaa, "Deep Text Summarization using Generative Adversarial Networks in Indian Languages," Procedia Computer Science, Vol.167, pp.147-153, 2020. https://doi.org/10.1016/j.procs.2020.03.192
  32. M. Arjovsky, S. Chintala, and L. Bottou, "Wasserstein GAN," arXiv preprint arXiv:1701.07875, 2017.
  33. L. Yu, W. Zhang, J. Wang, and Y. Yu, "Sequence generative adversarial nets with policy gradient. arxiv e-prints, page," arXiv preprint arXiv:1609.05473, 2016.
  34. C. Li, W. Xu, S. Li and S. Gao, "Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.2 (Short Papers), pp.55-60, 2018.
  35. G. K. Palshikar, "Simple algorithms for peak detection in time-series," in Proceedings of 1st International Conference of Advanced Data Analysis, Business Analytics and Intelligence, Vol.122, 2009.
  36. N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks," arXiv preprint arXiv:1908.10084, 2019.
  37. D. Miller, "Leveraging BERT for Extractive Text Summarization on Lectures," arXiv preprint arXiv:1906.04165, 2019.
  38. M. Lewis, et al., "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension," arXiv preprint arXiv:1910.13461, 2019.
  39. C. Raffel, "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer," arXiv preprint arXiv:1910.10683, 2019.
  40. C. Lin, "ROUGE: A package for automatic evaluation of summaries" in Text Summarization Branches Out, pp.74-81, 2004.
  41. F. Milton "A comparison of alternative tests of significance for the problem of m rankings," The Annals of Mathematical Statistics, Vol.11, No.1, pp.86-92, 1940. https://doi.org/10.1214/aoms/1177731944
  42. P. B. Nemenyi, "Distribution-free Multiple Comparisons," Ph.D. thesis, Princeton University, 1963.