Acknowledgement
이 성과는 2023년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. NRF-2022R1A2C1005316).
References
- Amplayo, R. K., Liu, P. J., Zhao, Y., & Narayan, S. (2022). Smart: Sentences as basic units for text evaluation. arXiv preprint arXiv:2208.01030.
- Arumae, K., & Liu, F. (2019, June). Guiding extractive summarization with question-answering rewards. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 2566-2577). Minneapolis, Minnesota: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1264 doi: 10.18653/v1/N19-1264
- Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv:2004.05150.
- Berry, M. W., Dumais, S. T., & O' Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573-595. Retrieved from https://doi.org/10.1137/1037127 doi: 10.1137/1037127
- Cachola, I., Lo, K., Cohan, A., & Weld, D. (2020, November). TLDR: Extreme summarization of scientific documents. In Findings of the association for computational linguistics: Emnlp 2020 (pp. 4766-4777). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2020.findings-emnlp.428 doi: 10.18653/v1/2020 .findings-emnlp.428
- Cao, S., & Wang, L. (2021). Cliff: Contrastive learning for improving faithfulness and factuality in abstractive summarization. arXiv preprint arXiv:2109.09209.
- Cao, Z., Wei, F., Li, W., & Li, S. (2018). Faithful to the original: Fact-aware neural abstractive summarization. In Proceedings of the thirty-second aaai conference on artificial intelligence and thirtieth innovative applications of artificial intelligence conference and eighth aaai symposium on educational advances in artificial intelligence. AAAI Press.
- Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S., Chang, W., & Goharian, N. (2018, June). A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 2 (short papers) (pp. 615-621). New Orleans, Louisiana: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N18-2097 doi: 10.18653/v1/N18-2097
- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Durmus, E., He, H., & Diab, M. (2020, July). FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 5055-5070). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/ 2020.acl-main.454 doi: 10.18653/v1/2020.acl-main.454
- Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479. https://doi.org/10.1613/jair.1523
- Fabbri, A., Wu, C.-S., Liu, W., & Xiong, C. (2022, July). QAFactEval: Improved QA-based factual consistency evaluation for summarization. In Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: Human language technologies (pp. 2587-2601). Seattle, United States: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2022.naacl-main.187 doi: 10.18653/v1/2022.naacl-main.187
- Falke, T., Ribeiro, L. F. R., Utama, P. A., Dagan, I., & Gurevych, I. (2019, July). Ranking generated summaries by correctness: An interesting but challenging application for natural language inference. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 2214-2220). Florence, Italy: Association for Computational Linguistics. Retrieved from https://aclanthology.org/P19-1213 doi: 10.18653/v1/P19-1213
- Gabriel, S., Celikyilmaz, A., Jha, R., Choi, Y., & Gao, J. (2021, August). GO FIGURE: A meta evaluation of factuality in summarization. In Findings of the association for computational linguistics: Acl-ijcnlp 2021 (pp. 478-487). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2021.findings-acl.42 doi: 10.18653/v1/2021.findings-acl.42
- Gliwa, B., Mochol, I., Biesek, M., & Wawer, A. (2019, November). SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization. In Proceedings of the 2nd workshop on new frontiers in summarization (pp. 70-79). Hong Kong, China: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D19-5409 doi:10.18653/v1/D19-5409
- Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). MIT Press.
- Goyal, T., & Durrett, G. (2021). Annotating and modeling fine-grained factuality in summarization. In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies.
- Gudivada, V. N. (2018). Chapter 12 - natural language core tasks and applications. In V. N. Gudivada & C. Rao (Eds.), Computational analysis and understanding of natural languages: Principles, methods and applications (Vol. 38, p. 403-428). Elsevier. Retrieved from https://www .sciencedirect.com/science/article/pii/S0169716118300257 doi: https://doi.org/10.1016/bs.host.2018.07.010
- Gupta, P., Wu, C.-S., Liu, W., & Xiong, C. (2021). Dialfact: A benchmark for fact-checking in dialogue. arXiv preprint arXiv:2110.08222.
- Hardy, Narayan, S., & Vlachos, A. (2019). Highres: Highlight-based referenceless evaluation of summarization.
- Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., & Blunsom, P. (2015). Teaching machines to read and comprehend. Advances in neural information processing systems, 28.
- Honovich, O., Aharoni, R., Herzig, J., Taitelbaum, H., Kukliansy, D., Cohen, V., … Matias, Y. (2022). True: Re-evaluating factual consistency evaluation. In Workshop on document-grounded dialogue and conversational question answering.
- Huang, L., Cao, S., Parulian, N., Ji, H., & Wang, L. (2021, June). Efficient attentions for long document summarization. In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (pp. 1419-1436). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2021.naacl-main.112 doi: 10.18653/v1/ 2021.naacl-main.112
- Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., … Fung, P. (2023, mar). Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12). Retrieved from https://doi.org/10.1145/3571730 doi: 10.1145/3571730
- Kageback, M., Mogren, O., Tahmasebi, N., & Dubhashi, D. (2014). Extractive summarization using continuous vector space models. In Proceedings of the 2nd workshop on continuous vector space models and their compositionality (cvsc) (pp. 31-39).
- Kim, B., Kim, H., & Kim, G. (2019, June). Abstractive summarization of Reddit posts with multi-level memory networks. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 2519-2531). Minneapolis, Minnesota: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1260 doi: 10.18653/v1/N19-1260
- Kim, D.-H., Lee, S.-W., & Lee, G. G.-B. (2002). Query-based document summarization using important sentence selection heuristics and mmr. In Annual conference on human and language technology (pp. 285-291).
- Koto, F., Lau, J. H., & Baldwin, T. (2020). Ffci: A framework for interpretable automatic evaluation of summarization. J. Artif. Intell. Res., 73.
- Kryscinski, W., McCann, B., Xiong, C., & Socher, R. (2020, November). Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (pp. 9332-9346). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/ 2020.emnlp-main.750 doi: 10.18653/v1/2020.emnlp-main.750
- Laban, P., Schnabel, T., Bennett, P. N., & Hearst, M. A. (2022). SummaC: Revisiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10, 163-177. Retrieved from https://aclanthology.org/2022.tacl-1.10 doi: 10.1162/tacl_a_00453
- Ladhak, F., Durmus, E., Cardie, C., & McKeown, K. (2020, November). WikiLingua: A new benchmark dataset for cross-lingual abstractive summarization. In Findings of the association for computational linguistics: Emnlp 2020 (pp. 4034-4048). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2020.findings-emnlp.360 doi: 10.18653/v1/2020.findings-emnlp.360
- Lavie, A., & Agarwal, A. (2007, June). METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the second workshop on statistical machine translation (pp. 228-231). Prague, Czech Republic: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W07-0734
- Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., … Zettlemoyer, L. (2020, July). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 7871-7880). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2020.acl-main.703 doi: 10.18653/v1/2020.acl-main.703
- Lin, C.-Y. (2004, July). ROUGE: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81). Barcelona, Spain: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W04-1013
- Liu, Y. (2019). Fine-tune bert for extractive summarization. arXiv preprint arXiv:1903.10318.
- Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345.
- Liu, Y., & Liu, P. (2021). Simcls: A simple framework for contrastive learning of abstractive summarization. arXiv preprint arXiv:2106.01890.
- Liu, Y., Liu, P., Radev, D., & Neubig, G. (2022). Brio: Bringing order to abstractive summarization. arXiv preprint arXiv:2203.16804.
- Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020, July). On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 1906-1919). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2020.acl-main.173 doi: 10.18653/v1/2020.acl-main.173
- Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
- Narayan, S., Cohen, S. B., & Lapata, M. (2018). Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. arXiv preprint arXiv:1808.08745.
- Pagnoni, A., Balachandran, V., & Tsvetkov, Y. (2021, June). Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (pp. 4812-4829). Online: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/2021.naacl-main.383 doi: 10.18653/v1/2021.naacl-main.383
- Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the association for computational linguistics (pp. 311-318). Philadelphia, Pennsylvania, USA: Association for Computational Linguistics. Retrieved from https://aclanthology.org/P02-1040 doi: 10.3115/1073083.1073135
- Pasunuru, R., & Bansal, M. (2018, June). Multi-reward reinforced summarization with saliency and entailment. In Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 2 (short papers) (pp. 646-653). New Orleans, Louisiana: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N18-2102 doi: 10.18653/v1/N18-2102
- Paulus, R., Xiong, C., & Socher, R. (2017). A deep reinforced model for abstractive summarization. ArXiv, abs/1705.04304.
- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., … Liu, P. J. (2020). Exploring the limits of transfer learning with a unified textto- text transformer. Journal of Machine Learning Research, 21(140), 1-67. Retrieved from http://jmlr.org/papers/v21/20-074.html
- Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016, November). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 2383-2392). Austin, Texas: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D16-1264 doi: 10.18653/v1/D16-1264
- Rothe, S., Narayan, S., & Severyn, A. (2020). Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics, 8, 264-280. Retrieved from https:// aclanthology.org/2020.tacl-1.18 doi: 10.1162/tacl_a_00313
- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/ 1910.01108. Retrieved from http://arxiv.org/abs/1910.01108
- Schuster, T., Fisch, A., & Barzilay, R. (2021, June). Get your vitamin C! robust fact verification with contrastive evidence. In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (pp. 624-643). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2021.naacl-main.52 doi: 10.18653/v1/2021.naacl-main.52
- Scialom, T., Dray, P.-A., Lamprier, S., Piwowarski, B., & Staiano, J. (2020, November). MLSUM: The multilingual summarization corpus. In Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp) (pp. 8051-8067). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2020.emnlp-main.647 doi: 10.18653/v1/2020.emnlp-main.647
- Scialom, T., Dray, P.-A., Patrick, G., Sylvain, L., Benjamin, P., Jacopo, S., & Alex, W. (2021). Questeval: Summarization asks for fact-based evaluation. arXiv preprint arXiv:2103.12693.
- Scialom, T., Lamprier, S., Piwowarski, B., & Staiano, J. (2019, November). Answers unite! unsupervised metrics for reinforced summarization models. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (emnlp-ijcnlp) (pp. 3246-3256). Hong Kong, China: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D19-1320 doi: 10.18653/v1/D19-1320
- See, A., Liu, P. J., & Manning, C. D. (2017, July). Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 1073-1083). Vancouver, Canada: Association for Computational Linguistics. Retrieved from https://aclanthology.org/P17-1099 doi: 10.18653/v1/P17-1099
- Sizov, G. (2010). Extraction-based automatic summarization: Theoretical and empirical investigation of summarization techniques..
- Sun, T., He, J., Qiu, X., & Huang, X. (2022, December). BERTScore is unfair: On social bias in language model-based metrics for text generation. In Proceedings of the 2022 conference on empirical methods in natural language processing (pp. 3726-3739). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2022.emnlp-main.245
- Tang, L., Goyal, T., Fabbri, A. R., Laban, P., Xu, J., Yahvuz, S., … Durrett, G. (2022). Understanding factual errors in summarization: Errors, summarizers, datasets, error detectors. arXiv preprint arXiv:2205.12854.
- Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018, June). FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers) (pp. 809-819). New Orleans, Louisiana: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N18-1074 doi: 10.18653/v1/N18-1074
- Vasilyev, O., Dharnidharka, V., & Bohannon, J. (2020). Fill in the blanc: Human-free quality estimation of document summaries. arXiv preprint arXiv:2002.09836.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
- Wang, A., Cho, K., & Lewis, M. (2020, July). Asking and answering questions to evaluate the factual consistency of summaries. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 5008-5020). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2020.acl-main.450 doi: 10.18653/v1/2020.acl-main.450
- Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. (2018, November). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP workshop BlackboxNLP: Analyzing and interpreting neural networks for NLP (pp. 353-355). Brussels, Belgium: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W18-5446 doi: 10.18653/v1/W18-5446
- Yu, W., Zhu, C., Li, Z., Hu, Z., Wang, Q., Ji, H., & Jiang, M. (2022). A survey of knowledge-enhanced text generation. ACM Computing Surveys, 54(11s), 1-38.
- Yuan, W., Neubig, G., & Liu, P. (2021). Bartscore: Evaluating generated text as text generation. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in neural information processing systems (Vol. 34, pp. 27263-27277). Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper/2021/file/e4d2b6e6fdeca3e60e0f1a62fee3d9dd-Paper.pdf
- Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., … others (2020). Big bird: Transformers for longer sequences. Advances in Neural Information Processing Systems, 33.
- Zhang, J., Zhao, Y., Saleh, M., & Liu, P. J. (2020). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th international conference on machine learning. JMLR.org.
- Zhang*, T., Kishore*, V., Wu*, F., Weinberger, K. Q., & Artzi, Y. (2020). Bertscore: Evaluating text generation with bert. In International conference on learning representations. Retrieved from https://openreview.net/forum?id=SkeHuCVFDr
- Zhu, C. (2021). Chapter 8 - applications and future of machine reading comprehension. In C. Zhu (Ed.), Machine reading comprehension (p. 185-207). Elsevier. Retrieved from https://www.sciencedirect.com/science/article/pii/B9780323901185000084 doi: https://doi.org/10.1016/B978-0-323-90118-5.00008-4
- Zhu, C., Hinthorn, W., Xu, R., Zeng, Q., Zeng, M., Huang, X., & Jiang, M. (2021, June). Enhancing factual consistency of abstractive summarization. In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: Human language technologies (pp. 718-733). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2021.naacl-main.58 doi: 10.18653/v1/2021.naacl-main.58
- 김탁영, 김지나, 강형원, 김수빈, & 강필성 (2022). 한국어 문서요약 및 음성합성 통합 프레임워크 구축. 대한산업공학회지, 48(1), 80-90.
- 박은환, 나승훈, 신동욱, 김선훈, & 강인호 (2021). Summary-to-document 를 이용한 텍스트 생성요약. 한국정보과학회 학술발표논문집, 308-310.
- 최경호, & 이창기 (2016). Copy mechanism과 input feeding을 이용한 end-to-end 한국어 문서요약. 한국어정보학회 학술대회, 56-61.
- Katherine Lee, Orhan Firat, Ashish Agarwal, Clara Fannjiang, and David Sussillo. 2018. Hallucinations in neural machine translation. In NIPS 2018 Interpretability and Robustness for Audio, Speech and Language Workshop