DOI QR코드

DOI QR Code

Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words

  • Received : 2021.04.23
  • Accepted : 2021.12.03
  • Published : 2022.06.30

Abstract

Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE-1 score was 55.11 and the word-order-based ROUGE-L score was 39.65.

Keywords

Acknowledgement

This research was supported by the Ministry of Education of the Republic of Korean and the National Research Foundation of Korea (No. NRF-2019S1A5A2A03046571) and this research was supported by the Korea Institute of Science and Technology Information (No. K-21-L01-C06-S01). Our research on BERT embedding, selective-pointing mechanism, OOV masking, and deep learning methods contributes to solve the bias and fairness problem in AI systems.

References

  1. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut, "Text summarization techniques: a brief survey," 2017 [Online]. Available: https://arxiv.org/abs/1707.02268.
  2. N. Nazari and M. A. Mahdavi, "A survey on automatic text summarization," Journal of AI and Data Mining, vol. 7, no. 1, pp. 121-135, 2019.
  3. S. Narayan, S. B. Cohen, and M. Lapata, "Ranking sentences for extractive summarization with reinforcement learning," 2018 [Online]. Available: https://arxiv.org/abs/1802.08636.
  4. J. Cheng and M. Lapata, "Neural summarization by extracting sentences and words," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016, pp. 484-494.
  5. W. Wang, Y. Gao, H. Y. Huang, and Y. Zhou, "Concept pointer network for abstractive summarization," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 3076-3085.
  6. G. Rossiello, P. Basile, and G. Semeraro, "Centroid-based text summarization through compositionality of word embeddings," in Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, Valencia, Spain, 2017, pp. 12-21.
  7. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," 2018 [Online]. Available: https://arxiv.org/abs/1810.04805.
  8. Y. Dong, "A survey on neural network-based summarization methods," 2018 [Online]. Available: https://arxiv.org/abs/1804.04589.
  9. T. Shi, Y. Keneshloo, N. Ramakrishnan, and C. K. Reddy, "Neural abstractive text summarization with sequence-to-sequence models," ACM Transactions on Data Science, vol. 2, no. 1, article no. 1, 2021. https://doi.org/10.1145/3419106
  10. M. Hu, Y. Peng, F. Wei, Z. Huang, D. Li, N. Yang, and M. Zhou, "Attention-guided answer distillation for machine reading comprehension," 2018 [Online]. Available: https://arxiv.org/abs/1808.07644.
  11. I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," Advances in Neural Information Processing Systems, vol. 27, pp. 3104-3112, 2014.
  12. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, pp. 5998-6008, 2017.
  13. D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," 2014 [Online]. Available: https://arxiv.org/abs/1409.0473.
  14. S. Xu, H. Li, P. Yuan, Y. Wu, X. He, and B. Zhou, "Self-attention guided copy mechanism for abstractive summarization," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), Virtual Event, 2020, pp. 1355-1362.
  15. J. Gu, Z. Lu, H. Li, and V. O. Li, "Incorporating copying mechanism in sequence-to-sequence learning," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016, pp. 1631-1640.
  16. C. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, and Y. Bengio, "Pointing the unknown words," in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), Berlin, Germany, 2016, pp. 140-149.
  17. R. Nallapati, B. Zhou, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequence-tosequence RNNs and beyond," 2016 [Online]. Available: https://arxiv.org/abs/1602.06023.
  18. K. Ganesan, "Rouge 2.0: updated and improved measures for evaluation of summarization tasks," 2018 [Online]. Available: https://arxiv.org/abs/1803.01937.
  19. M. Yasunaga, J. Kasai, R. Zhang, A. R. Fabbri, I. Li, D. Friedman, and D. R. Radev, "ScisummNet: a large annotated corpus and content-impact models for scientific paper summarization with citation networks," in Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, 2019, pp. 7386-7393.
  20. S. Erera, M. Shmueli-Scheuer, G. Feigenblat, O. P. Nakash, O. Boni, H. Roitman, et al., "A summarization system for scientific documents," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), Hong Kong, China, 2019, pp. 211-216.
  21. T. Lee and S. Kang, "Automatic text summarization based on selective OOV copy mechanism with BERT embedding," Journal of KIISE, vol. 47, no. 1, pp. 36-44, 2020. https://doi.org/10.5626/jok.2020.47.1.36
  22. L. N. Smith, "Cyclical learning rates for training neural networks," in Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, 2017, pp. 464-472.
  23. I. Loshchilov and F. Hutter, "SGDR: stochastic gradient descent with warm restarts," 2016 [Online]. Available: https://arxiv.org/abs/1608.03983.
  24. J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018, pp. 328-339.
  25. G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, "Snapshot ensembles: train 1, get m for free," 2017 [Online]. Available: https://arxiv.org/abs/1704.00109.
  26. K. Goyal, G. Neubig, C. Dyer, and T. Berg-Kirkpatrick, "A continuous relaxation of beam search for end-toend training of neural sequence models," in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New Orleans, LA, 2018, pp. 3045-3052.