DOI QR코드

DOI QR Code

한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구

A Study of Pre-trained Language Models for Korean Language Generation

  • 송민채 (농협중앙회 디지털혁신실) ;
  • 신경식 (이화여자대학교 경영대학)
  • Song, Minchae (Nonghyup, The Department of Big Data Strategy) ;
  • Shin, Kyung-shik (School of Business, Ewha Womans University)
  • 투고 : 2022.11.14
  • 심사 : 2022.12.14
  • 발행 : 2022.12.31

초록

본 연구는 자연어처리의 분석목적과 추론데이터 성격에 적합한 한국어 사전훈련 언어모델의 특성을 실증분석했다. 이를 위해 자연어생성이 가능한 대표적 사전훈련 언어모델인 BART와 GPT 모델을 실험에 사용했다. 구체적으로 한국어 텍스트를 BART와 GPT 모델에 학습한 사전훈련 언어모델을 사용해 문서요약 생성 성능을 비교했다. 다음으로 추론데이터의 특성에 따라 언어모델의 성능이 어떻게 달라지는지 확인하기 위해 6가지 정보전달성과 4가지 창작물 유형의 한국어 텍스트 문서에 적용했다. 그 결과, 모든 문서유형에서 인코더와 디코더가 모두 있는 BART의 구조가 디코더만 있는 GPT 모델보다 더 높은 성능을 보였다. 추론데이터의 특성이 사전훈련 언어모델의 성능에 미치는 영향을 살펴본 결과, KoGPT는 데이터의 길이에 성능이 비례한 것으로 나타났다. 그러나 길이가 가장 긴 문서에 대해서도 KoGPT보다 KoBART의 성능이 높아 다운스트림 태스크 목적에 맞는 사전훈련 모델의 구조가 자연어생성 성능에 가장 크게 영향을 미치는 요소인 것으로 나타났다. 추가적으로 본 연구에서는 정보전달성과 창작물로 문서의 특징을 구분한 것 외에 품사의 비중으로 문서의 특징을 파악해 사전훈련 언어모델의 성능을 비교했다. 그 결과, KoBART는 어미와 형용사/부사, 동사의 비중이 높을수록 성능이 떨어진 반면 명사의 비중이 클수록 성능이 좋았다. 반면 KoGPT는 KoBART에 비해 품사의 비중과 상관도가 낮았다. 이는 동일한 사전훈련 언어모델이라도 추론데이터의 특성에 따라 자연어생성 성능이 달라지기 때문에 다운스트림 태스크에 사전훈련 언어모델 적용 시 미세조정 외에 추론데이터의 특성에 대한 고려가 중요함을 의미한다. 향후 어순 등 분석을 통해 추론데이터의 특성을 파악하고, 이것이 한국어 생성에 미치는 영향을 분석한다면 한국어 특성에 적합한 언어모델이나 자연어생성 성능 지표 개발이 가능할 것이다.

This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

키워드

과제정보

본 연구는 한국연구재단의 지원을 받아 수행되었음 (NRF-2021R1A2C1012036)

참고문헌

  1. Alomari, A., Norisma, I., Sabri, A. Q. M., and I. Alsmadi, "Deep reinforcement and transfer learning for abstractive text summarization: A review," Computer Speech & Language, Vol.71 (2022), 101276. https://doi.org/10.1016/j.csl.2021.101276
  2. Alyafeai, Z., AlShaibani, M. S., and I. Ahmad, "A survey on transfer learning in natural language processing," In ArXiv, vol. abs/2007.04239, (2020).
  3. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and D. Amodei, "Language models are few-shot learners," In Advances in Neural Information Processing Systems, Vol.33(2020), 1877~1901.
  4. Cao, Z., Wei, F., Li, W., and S. Li, "Faithful to the original: Fact aware neural abstractive summarization," In Proceedings of the Thirty Second AAAI Conference on Artificial Intelligence, (2018), 4784~4791.
  5. Cho, K., Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Y. Bengio, "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," In Proceedings of the Empirical Methods in Natural Language Processing, (2014), 1724~1734.
  6. K.R. Chowdhary, Fundamentals of Artificial Intelligence. Springer, 2020.
  7. Deng, S., Zhang, N., Yang, J., Ye, H., Tan, C., Chen, M., Huang, S., Huang, F., and H. Chen, "LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training," In CoRR abs/2112.01404. arXiv:2112.01404, https://arxiv.org/abs/2112.01404, (2021).
  8. Devlin, J., Chang, M. W., Lee, K., and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," In arXiv:1810.04805, https://arxiv.org/abs/1810.04805, (2018).
  9. Dong, Y., Wang, S., Gan, Z., Cheng, Y., Cheung, J. C. K., and J. Liu, "Multi-fact correction in abstractive text summarization," In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, (2020), 9320~9331.
  10. Egonmwan, E., and Y. Chali, "Transformer and seq2seq model for Paraphrase Generation," In Proceedings of the 3rd Workshop on Neural Generation and Translation, China. Association for Computational Linguistics, (2019), 249~255.
  11. El-Kassas, W. S., Salama, C. R., Rafea, A. A., and H. K. Mohamed, "Automatic Text Summarization: A Comprehensive Survey," Expert Systems with Applications, Vol.165(2021), 113679. https://doi.org/10.1016/j.eswa.2020.113679
  12. Ermakova, L., Cossu, J.V., and J. Mothe, "A survey on evaluation of summarization methods," Information Processing and Management, Vol.56, No.5(2019), 1794~1814. https://doi.org/10.1016/j.ipm.2019.04.001
  13. Falke, T., Ribeiro, L. F. R., Utama, P. A., Dagan, I., and I. Gurevych, "Ranking generated summaries by correctness: An interesting but challenging application for natural language inference," In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (2019), 2214~2220.
  14. Fu, X., Wang, J., Zhang, J., Wei, J., and Z. Yang, "Document Summarization with VHTM: Variational Hierarchical Topic-Aware Mechanism," In Proceedings of the Association for the Advancement of Artificial Intelligence, (2020), 7740~7747.
  15. Guan, J., Mao, X., Fan, C., Liu, Z., Ding, W., and M. Huang, "Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence," In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol.1(2021), 6379~6393.
  16. Gupta, S., and S. K. Gupta, "Abstractive summarization: An overview of the state of the art," Expert Systems with Applications, Vol.121, No.1(2019), 49~65. https://doi.org/10.1016/j.eswa.2018.12.011
  17. Heu, J.-U, "Analysis and Comparison of Query focused Korean Document Summarization using Word Embedding," The Journal of the Institute of Internet, Broadcasting and Communication, Vol.19, No.6(2019), 161~167. https://doi.org/10.7236/JIIBC.2019.19.6.161
  18. Kryscinski, W., Keskar, N. S., McCann, B., Xiong, C., and R. Socher, "Neural text summarization: A critical evaluation," In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, (2019), 540~551.
  19. Kryscinski, W., McCann, B., Xiong, C., and R. Socher, "Evaluating the factual consistency of abstractive text summarization," In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, (2020), 9332~9346.
  20. Howard, J., and S. Ruder, "Universal Language Model Fine-tuning for Text Classification," In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Vol.1(2018), 328~339.
  21. Huang, D., Cui, L., Yang, S., Bao, G. Wang, K., Xie, J. and Y. Zhang, "What have we achieved on text summarization?," In Proceedings of Conference on Empirical Methods in Natural Language Processing, (2020), 446~469.
  22. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and L. Zettlemoyer, "Bart: Denoising sequence-to-sequencepre-training for natural language generation, translation, and comprehension." In arXiv preprint arXiv:1910.13461, (2019).
  23. Li, J., Tang, T., Nie, J.Y., Wen, J. R., and W. X. Zhao, "Learning to Transfer Prompts for Text Generation," In arXiv preprint arXiv:2205.01543, http://arxiv.org/abs/2205.01543, (2022).
  24. Lin, B. Y., Zhou., W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and X. Ren, "CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning," In Findings of the Association for Computational Linguistics: Empirical Methods in Natural Language Processing, (2020), 1823~1840.
  25. Liu, Y., Wan, Y., He, L., Peng, H., and P. S. Yu, "KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning," In Proceedings of the Association for the Advancement of Artificial Intelligence, Vol.35 (2021), 6418~6425.
  26. Lu, X., West, P., Zellers, R., Bras, R. L., Bhagavatula, C., and Y. Choi, "NEUROLOGIC DECODING: (Un)supervised Neural Text Generation with Predicate Logic Constraints," In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2021), 4288~4299.
  27. Luo, Y., Lu, M., Liu, G., and S. Wang, "Few-Shot Table-to-Text Generation with Prefix-Controlled Generator," In THE 29TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, arXiv:2208.10709, (2022).
  28. Martschat, S., and K. Markert, "Improving ROUGE for Timeline Summarization," In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol.2(2017), 285~290.
  29. Narayan, S., Cohen, S., and M. Lapata, "Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization," In Proceedings of the Empirical Methods in Natural Language Processing, (2018), 1797~1807.
  30. Park, H-y, and K.-j., Kim, "Recommender system using BERT sentiment analysis," Journal of Intelligence and Information Systems, Vol.27, No.2(2021), 1~15. https://doi.org/10.13088/JIIS.2021.27.2.001
  31. Park, H-j, and K.-s., Shin, "Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models," Journal of Intelligence and Information Systems, Vol.26, No.4(2020), 1~25. https://doi.org/10.13088/JIIS.2020.26.4.001
  32. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and L. Zettlemoyer, "Deep Contextualized Word Representations," In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol.1(2018), 2227~2237.
  33. Radford, A., Narasimhan, K., Salimans, T., and I. Sutskever, "Improving language understanding by generative pre-training," In https://s3-us-west-2.amazonaws.com/openaiassets/research-covers/languageunsupervised/language understanding paper, (2018).
  34. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, 1(8):9(2019).
  35. Ruder, S., Peters, M. E., Swayamdipta, S., and T. Wolf, "Transfer learning in natural language processing," In Proceedings of NAACL-HLT: Tutorials, (2019), 15~18.
  36. Shin, J., Noh, Y., Song, H.-J, and S., Park, "Solving Factual Inconsistency in Abstractive Summarization using Named Entity Fact Discrimination," Journal of KIISE, Vol.49, No.3(2022), 231~240. https://doi.org/10.5626/JOK.2022.49.3.231
  37. Sridhar R., and D. Yang, "Explaining Toxic Text via Knowledge Enhanced Text Generation," In North American Chapter of the Association for Computational Linguistics, (2022), 811~826.
  38. Tang, T., Li, J., Zhao, W. X., and J. R. Wen, "Context-Tuning: Learning Contextualized Prompts for Natural Language Generation," In arXiv preprint arXiv:2201.08670, (2022).
  39. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, Vol.30(2017).
  40. Wang, Y., Li, J., Pong, H., King, I., Lyu, M., and S. Shi, "Topic-Aware Neural Key phrase Generation for Social Media Language," In Proceedings of the Association for Computational Linguistics, (2019), 2516~2526
  41. Weng, R., Yu, H., Huang, S., Cheng, S., and W. Luo, "Acquiring Knowledge from Pre-Trained Model to Neural Machine Translation," In Proceedings of the Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence, Vol.34(2020), 9266~9273.
  42. Wu, S., Zhao, X., Yu, T., Zhang, R., Shen, C., Liu, H., Li, F., Zhu, H., Luo, J., Xu, L., and X. Zhang, "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning," In ArXiv, abs/2110.04725, (2021).
  43. Yun. Y. Ko, E, and N. Kim, "Subject-Balanced Intelligent Text Summarization Scheme," Journal of Intelligence and Information Systems, Vol.25, No.2(2019), 141~166. https://doi.org/10.13088/JIIS.2019.25.2.141
  44. Zaken, E. B., Ravfogel, S., and Y. Goldberg, "BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models," In Proceedings of the Association for Computational Linguistics, (2022).
  45. Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X.V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P.S., Sridhar, A., Wang, T., and L. Zettlemoyer, "OPT: Open Pre-trained Transformer Language Models," In arXiv preprint arXiv:2205.01068, (2022).
  46. Zhu, C., Hinthorn, W., Xu, R., Zeng, Q., Zeng, M., Huang, X., and M. Jiang, "Enhancing factual consistency of abstractive summarization," In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2021), 718~733.