DOI QR코드

DOI QR Code

A Survey on Deep Learning-based Pre-Trained Language Models

딥러닝 기반 사전학습 언어모델에 대한 이해와 현황

  • 박상언 (경기대학교 소프트웨어경영대학 ICT융합학부 경영정보전공)
  • Received : 2022.11.14
  • Accepted : 2022.12.05
  • Published : 2022.12.31

Abstract

Pre-trained language models are the most important and widely used tools in natural language processing tasks. Since those have been pre-trained for a large amount of corpus, high performance can be expected even with fine-tuning learning using a small number of data. Since the elements necessary for implementation, such as a pre-trained tokenizer and a deep learning model including pre-trained weights, are distributed together, the cost and period of natural language processing has been greatly reduced. Transformer variants are the most representative pre-trained language models that provide these advantages. Those are being actively used in other fields such as computer vision and audio applications. In order to make it easier for researchers to understand the pre-trained language model and apply it to natural language processing tasks, this paper describes the definition of the language model and the pre-learning language model, and discusses the development process of the pre-trained language model and especially representative Transformer variants.

사전학습 언어모델은 자연어 처리 작업에서 가장 중요하고 많이 활용되는 도구로, 대량의 말뭉치를 대상으로 사전학습이 되어있어 적은 수의 데이터를 이용한 미세조정학습으로도 높은 성능을 기대할 수 있으며, 사전학습된 토크나이저과 딥러닝 모형 등 구현에 필요한 요소들이 함께 배포되기 때문에 자연어 처리 작업에 소요되는 비용과 기간을 크게 단축시켰다. 트랜스포머 변형 모형은 이와 같은 장점을 제공하는 사전학습 언어모델 중에서 최근 가장 많이 사용되고 있는 모형으로, 번역을 비롯하여 문서 요약, 챗봇과 같은 질의 응답, 자연스러운 문장의 생성 및 문서의 분류 등 다양한 자연어 처리 작업에 활용되고 있으며 컴퓨터 비전 분야와 오디오 관련 분야 등 다른 분야에서도 활발하게 활용되고 있다. 본 논문은 연구자들이 보다 쉽게 사전학습 언어모델에 대해 이해하고 자연어 처리 작업에 활용할 수 있도록 하기 위해, 언어모델과 사전학습 언어모델의 정의로부터 시작하여 사전학습 언어모델의 발전과정과 다양한 트랜스포머 변형 모형에 대해 조사하고 정리하였다.

Keywords

Acknowledgement

이 논문은 2021학년도 경기대학교 연구년 수혜로 연구되었음.

References

  1. T. Lin et al., "A survey of transformers", AI Open, 2022. 
  2. A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale.", arXiv preprint arXiv:2010.11929, 2020. 
  3. M. Chen, et al., "Generative pretraining from pixels", International conference on machine learning. PMLR, 2020. 
  4. C. Subakan et al., "Attention is all you need in speech separation", ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. 
  5. H. Akbari et al., "Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text" Advances in Neural Information Processing Systems, Vol.34, pp.24206-24221, 2021. 
  6. H. Li, "Language models: past, present, and future", Communications of the ACM, Vol.65, No.7, pp56-63, 2022.  https://doi.org/10.1145/3490443
  7. A. Radford et al., "Language models are unsupervised multitask learners.", OpenAI blog Vol.1, No.8, p.9, 2019. 
  8. J. Devlin et al., "Bert: Pre-training of deep bidirectional transformers for language understanding", arXiv preprint arXiv:1810.04805, 2018. 
  9. T. Brown et al., "Language models are few-shot learners", Advances in neural information processing systems, Vol.33, pp.1877-1901, 2020. 
  10. S.J. Pan an d Y. Qian g, "A survey on transfer learning", IEEE Transactions on knowledge and data engineering, Vol.22, No.10, pp.1345-1359, 2009. 
  11. Y. Bengio et al., "A neural probabilistic language model", Advances in neural information processing systems, Vol.13, 2000. 
  12. T. Mikolov et al., "Efficient estimation of word representations in vector space", arXiv preprint arXiv:1301.3781. 2013. 
  13. J. Sarzynska-Wawer et al., "Detecting formal thought disorder by deep contextualized word representations", Psychiatry Research, Vol.304, p.114135, 2021. 
  14. A. Vaswani et al., "Attention is all you need", Advances in neural information processing systems, Vol.30, pp.5998-6008, 2017. 
  15. J.L. Ba et al., "Layer normalization", arXiv preprint arXiv:1607.06450, 2016. 
  16. Y. Wu et al., "Google's neural machine translation system: Bridging the gap between human and machine translation", arXiv preprint arXiv:1609. 08144, 2016. 
  17. A. Radford et al., "Improving language understanding by generative pre-training", 2018. 
  18. A. Wang et al., "GLUE: A multi-task benchmark and analysis platform for natural language understanding", arXiv preprint arXiv:1804.07461, 2018. 
  19. A. Wang et al., "Superglue: A stickier benchmark for general-purpose language understanding systems", Advances in neural information processing systems, Vol.32, 2019. 
  20. Y. Liu et al., "Roberta: A robustly optimized bert pretraining approach", arXiv preprint arXiv:1907. 11692, 2019. 
  21. Z. Lan et al., "Albert: A lite bert for self-supervised learning of language representations", arXiv preprint arXiv:1909.11942, 2019. 
  22. K. Clark et al., "Electra: Pre-training text encoders as discriminators rather than generators", arXiv preprint arXiv:2003.10555, 2020. 
  23. M. Lewis et al., "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension", arXiv preprint arXiv:1910.13461, 2019. 
  24. https://aiopen.etri.re.kr/service_dataset.php, 2019. 
  25. https://github.com/SKTBrain/KoBERT, 2019. 
  26. S. Lee et al., "Kr-bert: A small-scale korean-specific language model", arXiv preprint arXiv:2008.03979, 2020. 
  27. https://github.com/monologg/KoELECTRA, 2020. 
  28. https://huggingface.co/xlm-roberta-base 
  29. https://aida.kisti.re.kr/data/107ca6f3-ebcb-4a64-87d5-cea412b76daf, 2021. 
  30. https://github.com/SKT-AI/KoGPT2, 2020. 
  31. https://github.com/haven-jeon/kogpt2-chatbot, 2022. 
  32. https://github.com/kakaobrain/kogpt, 2021. 
  33. https://github.com/SKT-AI/KoBART, 2020.