DOI QR코드

DOI QR Code

Title Generation Model for which Sequence-to-Sequence RNNs with Attention and Copying Mechanisms are used

주의집중 및 복사 작용을 가진 Sequence-to-Sequence 순환신경망을 이용한 제목 생성 모델

  • 이현구 (강원대학교 컴퓨터정보통신공학과) ;
  • 김학수 (강원대학교 컴퓨터정보통신공학과)
  • Received : 2017.02.01
  • Accepted : 2017.05.13
  • Published : 2017.07.15

Abstract

In big-data environments wherein large amounts of text documents are produced daily, titles are very important clues that enable a prompt catching of the key ideas in documents; however, titles are absent for numerous document types such as blog articles and social-media messages. In this paper, a title-generation model for which sequence-to-sequence RNNs with attention and copying mechanisms are employed is proposed. For the proposed model, input sentences are encoded based on bi-directional GRU (gated recurrent unit) networks, and the title words are generated through a decoding of the encoded sentences with keywords that are automatically selected from the input sentences. Regarding the experiments with 93631 training-data documents and 500 test-data documents, the attention-mechanism performances are more effective (ROUGE-1: 0.1935, ROUGE-2: 0.0364, ROUGE-L: 0.1555) than those of the copying mechanism; in addition, the qualitative-evaluation radiative performance of the former is higher.

대용량의 텍스트 문서가 매일 만들어지는 빅데이터 환경에서 제목은 문서의 핵심 아이디어를 빠르게 집어내는데 매우 중요한 단서가 된다. 그러나 블로그 기사나 소셜 미디어 메시지와 같은 많은 종류의 문서들은 제목을 갖고 있지 않다. 본 논문에서는 주의집중 및 복사 작용을 가진 sequence-to-sequence 순환신경망을 사용한 제목 생성 모델을 제안한다. 제안 모델은 양방향 GRU(Gated Recurrent Unit) 네트워크에 기반 하여 입력 문장을 인코딩(encoding)하고, 입력 문장에서 자동 선별된 키워드와 함께 인코딩된 문장을 디코딩함으로써 제목 단어들을 생성한다. 93,631문서의 학습 데이터와 500문서의 평가 데이터를 가진 실험에서 주의집중 작용방법이 복사 작용방법보다 높은 어휘 일치율(ROUGE-1: 0.1935, ROUGE-2: 0.0364, ROUGE-L: 0.1555)을 보였고 사람이 정성평가한 지표는 복사 작용방법이 높은 성능을 보였다.

Keywords

Acknowledgement

Grant : 빅데이터 자동 태깅 및 태그 기반 DaaS 시스템 개발

Supported by : 정보통신기술진흥센터, 한국연구재단

References

  1. L. Sutskever, O. Vinyals, and Q. V. Le, "sequence-to-sequence learning with neural networks," Advances in neural information processing systems, pp. 3104-3112, 2014.
  2. K. Han and Y. Ahn, "Automatic labeling of Korean document clusters created by LDA," Proc. of the Korea Computer Congress 2013, pp. 616-618, 2013.
  3. T. Kim and S. Myaeng, "Automatic Naming of Document Clusters by Using their Hierarchical Structure," Proc. of the 13th Annual Conference on Human and Cognitive Language Technology, pp. 163-170, 2001.
  4. F. J. Pineda, "Generalization of back-propagation to recurrent neural networks," Physical review letters 59.19: 2229, 1987. https://doi.org/10.1103/PhysRevLett.59.2229
  5. K. Cho, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," arXiv preprint arXiv:1406.1078, 2014.
  6. A. M. Rush, S. Chopra and J. Weston, "A neural attention model for abstractive sentence summarization," arXiv preprint arXiv:1509.00685, 2015.
  7. K. Choi, C. Lee, "End-to-end Document Summarization using Copy Mechanism and Input Feeding," Proc. of the 28th Annual Conference on Human & Cognitive Language Technology, pp. 56-61, 2016.
  8. D. Hewlett, A. Lacoste, L. Jones, I. Polosukhin, A. Fandrianto, J. Han, M. Kelcey and D. Berthelot, "WikiReading A Novel Large-scale Language Understanding Task over Wikipedia," arXiv preprint arXiv:1608.03542, 2016.
  9. O. Vinyals, Q. Le, "A neural conversational model," arXiv preprint arXiv:1506.05869, 2015.
  10. D. Bahdanau, K. Cho and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," arXiv preprint arXiv:1409.0473, 2014.
  11. J. Gu, Z. Lu, H. Li and V. O. K. Li, "Incorporating Copying Mechanism in Sequence-to-Sequence Learning," arXiv preprint arXiv:1603.06393, 2016.
  12. M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing 45.11, pp. 2673-2681, 1997. https://doi.org/10.1109/78.650093
  13. K. Cho, B. V. Merrienboer, D. Bahdanau and Y. Bengio, "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches," arXiv preprint arXiv:1409.1259, 2014.
  14. T. Kikolov, K. Chen, G. Corrado,and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv preprint arXiv:1301.3781, 2013.
  15. C. Y. Lin, "ROUGE A Package for Automatic Evaluation of Summaries," Text summarization branches out: Proceedings of the ACL-04 workshop, Vol. 8, 2004.
  16. M. Paterson and V. Dancik, "Longest common subsequences," International Symposium on Mathematical Foundations of Computer Science, 1994.
  17. J. Cohen, "Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit," Psychological Bulletin, Vol. 70, No. 4, pp. 213-220, 1968. https://doi.org/10.1037/h0026256
  18. J. Oh and Y. Shim, "Statisical Analysis of Water Quality of Domestic Acid Mine Drainage(AMD)," Journal of the Korean Society of Civil Engineers B, 23.6B, pp. 587-596, 2003.