참고문헌
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez... & I. Polosukhin. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). DOI : 10.5555/3295222.3295349
- J. A. Bernard. (1988). Use of a rule-based system for process control. IEEE Control Systems Magazine, 8(5), 3-13. DOI : 10.1109/37.7735
- C. Manning. & H. Schutze. (1999). Foundations of statistical natural language processing. MIT press. DOI : 10.5555/311445
- J. Devlin, M. W. Chang, K. Lee. & K. Toutanova. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171-4186). DOI : 10.18653/v1/N19-1423
- Y. Liu et al. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- K. Clark, M. T. Luong, Q. V. Le & C. D. Manning. (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In International Conference on Learning Representations. URL : https://openreview.net/forum?id=r1xMH1BtvB
- S. Park, J. Moon, S. Kim, W. I. Cho, J. Han, J. Park ... & K. C. A. O. J. H. K. Cho. (2021). KLUE: Korean Language Understanding Evaluation. arXiv preprint arXiv:2105.09680.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei. & I. Sutskever. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, ... & D. Amodei. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
- M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, ... & L. Zettlemoyer. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- C. Park., J. Seo, S. Lee, C. Lee, H. Moon, S. Eo & H. S. Lim. (2021, August). BTS: Back TranScription for speech-to-text post-processor using text-to-speech-to-text. In Proceedings of the 8th Workshop on Asian Translation (WAT2021) (pp. 106-116). DOI : 10.18653/v1/2021.wat-1.10
- K. Gregor, I. Danihelka, A. Mnih, C. Blundell & D. Wierstra. (2014, June). Deep autoregressive networks. In International Conference on Machine Learning (pp. 1242-1250). PMLR.
- H. Rashkin, E. M. Smith, M. Li & Y. L. Boureau. (2018). Towards empathetic open-domain conversation models: A new benchmark and dataset. arXiv preprint arXiv:1811.00207.
- S. Robertson & H. Zaragoza. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Information Retrieval, 3(4), 333-389. DOI : 10.1561/1500000019
- S. Humeau, K. Shuster, M. A. Lachaux & J. Weston. (2019, September). Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring. In International Conference on Learning Representations. URL : https://openreview.net/forum?id=SkxgnnNFvH
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, ... & A. M. Rush. (2019). Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- I. Loshchilov & F. Hutter. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- S. Wiseman & A. M. Rush. (2016, November). Sequence-to-Sequence Learning as Beam-Search Optimization. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 1296-1306). DOI : 10.18653/v1/D16-1137
- S. Roller, E. Dinan, N. Goyal, D. Ju, M. Williamson, Y. Liu, ... & J. Weston. (2020). Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637.
- K. Papineni, S. Roukos, T. Ward & W. J. Zhu. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). DOI : 10.3115/1073083.1073135
- C. Y. Lin. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81). URL : https://aclanthology.org/W04-1013
- T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger & Y. Artzi. (2019, September). BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations. URL: https://openreview.net/forum?id=SkeHuCVFDr
- S. Gehrmann, T. Adewumi, K. Aggarwal, P. S. Ammanamanchi, A. Anuoluwapo, A. Bosselut, ... & J. Zhou. (2021). The gem benchmark: Natural language generation, its evaluation and metrics. arXiv preprint arXiv:2102.01672.
- T. Schick & H. Schutze. (2021, June). It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2339-2352). DOI :10.18653/v1/2021.naacl-main.1815
- T. Gao, A. Fisch & D. Chen. (2020). Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723