Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv

Park, Dae-Min;Lee, Han-Jong;

doi:10.22693/NIAIP.2024.31.2.003

Informatization Policy (정보화정책)

Volume 31 Issue 2
/
Pages.3-38
/
2024
/
1598-3498(pISSN)

National Information Society Agency (한국지능정보사회진흥원)

DOI QR Code

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv

챗GPT 등장 이후 인공지능 환각 연구의 문헌 검토: 아카이브(arXiv)의 논문을 중심으로

Park, Dae-Min (School of Media & Communication, Sunmoon University) ;
Lee, Han-Jong (Graduate School of Communication, Seoul National University)

박대민 ;
이한종

Received : 2024.04.30
Accepted : 2024.05.20
Published : 2024.06.30

https://doi.org/10.22693/NIAIP.2024.31.2.003 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Hallucination is a significant barrier to the utilization of large-scale language models or multimodal models. In this study, we collected 654 computer science papers with "hallucination" in the abstract from arXiv from December 2022 to January 2024 following the advent of Chat GPT and conducted frequency analysis, knowledge network analysis, and literature review to explore the latest trends in hallucination research. The results showed that research in the fields of "Computation and Language," "Artificial Intelligence," "Computer Vision and Pattern Recognition," and "Machine Learning" were active. We then analyzed the research trends in the four major fields by focusing on the main authors and dividing them into data, hallucination detection, and hallucination mitigation. The main research trends included hallucination mitigation through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), inference enhancement via "chain of thought" (CoT), and growing interest in hallucination mitigation within the domain of multimodal AI. This study provides insights into the latest developments in hallucination research through a technology-oriented literature review. This study is expected to help subsequent research in both engineering and humanities and social sciences fields by understanding the latest trends in hallucination research.

환각은 대형언어모형이나 대형 멀티모달 모형의 활용을 막는 큰 장벽이다. 본 연구에서는 최신 환각 연구 동향을 살펴보기 위해 챗 GPT 등장 이후인 2022년 12월부터 2024년 1월까지 아카이브(arXiv)에서 초록에 '환각'이 포함된 컴퓨터과학 분야 논문 654건을 수집해 빈도분석, 지식연결망 분석, 문헌 검토를 수행했다. 이를 통해 분야별 주요 저자, 주요 키워드, 주요 분야, 분야 간 관계를 분석했다. 분석 결과 '계산 및 언어'와 '인공지능', '컴퓨터비전 및 패턴인식', '기계학습' 분야의 연구가 활발했다. 이어 4개 주요 분야 연구 동향을 주요 저자를 중심으로 데이터 측면, 환각 탐지 측면, 환각 완화 측면으로 나눠 살펴보았다. 주요 연구 동향으로는 지도식 미세조정(SFT)과 인간 피드백 기반 강화학습(RLHF)을 통한 환각 완화, 생각의 체인(CoT) 등 추론 강화, 자동화와 인간 개입의 병행, 멀티모달 AI의 환각 완화에 대한 관심 증가 등을 들 수 있다. 본 연구는 환각 연구 최신 동향을 파악함으로써 공학계는 물론 인문사회계 후속 연구의 토대가 될 것으로 기대한다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(No. 2022R1A5A7083908).

References

Abbas, A., Tirumala, K., Simig, D., Ganguli, S. & Morcos, A.S. (2023). SemDeDup: Data-efficient learning at web-scale through semantic deduplication. arXiv preprint. https://arxiv.org/abs/2303.09540
Agrawal, G., Kumarage, T., Alghami, Z. & Liu, H. (2023). Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey. arXiv preprint. https://arxiv.org/abs/2311.07914
Amayuelas, A., Pan, L., Chen, W. & Wang, W. (2023). Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.13712
Andriopoulos, K. & Pouwelse, J. (2023). Augmenting LLMs with Knowledge: A survey on hallucination prevention. arXiv preprint. https://arxiv.org/abs/2309.16459
Augenstein, I., Baldwin, T., Cha, M., Chakraborty, T., Ciampaglia, G. L., Corney, D., DiResta, R., Ferrara, E., Hale, S., Halevy, A., Hovy, E., Ji, H., Menczer, F., Miguez, R., Nakov, P., Scheufele, D., Sharma, S. & Zagni, G. (2023). Factuality Challenges in the Era of Large Language Models. arXiv preprint. https://arxiv.org/abs/231005189.
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kadavath, S., Kernion, J., Conerly, T., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Hume, T., Johnston, S., Kravec, S., Lovitt, L., Nanda, N., Olsson, C., Amodei, D., Brown, T.B., Clark, J., McCandlish, S., Olah, C., Mann, B. & Kaplan, J. (2022). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv preprint. https://arxiv.org/abs/2204.05862
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q. V., Xu, Y. & Fung, P. (2023). A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv preprint. https://arxiv.org/abs/2302.04023
Barrett, C., Boyd, B., Bursztein, E., Carlini, N., Chen, B., Choi, J., Chowdhury, A., Christodorescu, M., Datta, A., Feizi, S., Fisher, K., Hashimoto, T., Hendrycks, D., Jha, S., Kang, D., Kerschbaum, F., Mitchell, E., Mitchell, J., Ramzan, Z., Shams, K., Song, D., Taly, A. & Yang, D. (2023). "Identifying and mitigating the security risks of generative ai." Foundations and Trends® in Privacy and Security, 6(1), 1-52.
Beltagy, I., Peters, M.E. & Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv preprint. https://arxiv.org/abs/2004.05150
Berglund, L., Tong, M., Kaufmann, M., Balesni, M., Stickland, A. C., Korbak, T. & Evans, O. (2023). The Reversal Curse: LLMs trained on" A is B" fail to learn" B is A". arXiv preprint. https://arxiv.org/abs/2309.12288
Chen, X., Li, M., Gao, X. & Zhang, X. (2022). "Towards improving faithfulness in abstractive summarization." Advances in Neural Information Processing Systems, 35, 24516-24528.
Chen, Y., Liu, Y., Meng, F., Chen, Y., Xu, J. & Zhou, J. (2023a). Improving Translation Faithfulness of Large Language Models via Augmenting Instructions. arXiv preprint. https://arxiv.org/abs/2308.12674
Chen, S., Zhao, Y., Zhang, J., Chern, E., Gao, S., Liu, P. & He, J. (2023b). FELM: Benchmarking Factuality Evaluation of Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.00741
Chen, B., Zhang, Z., Langrene, N. & Zhu, S. (2023c). Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review. arXiv preprint. https://arxiv.org/abs/2310.14735
Chen, X., Song, D., Gui, H., Wang, C., Zhang, N., Yong, J., Huang, F., Lv, C., Zhang, D. & Chen, H. (2023d). FactCHD: Benchmarking Fact-Conflicting Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.12086
Chen, Z., Li, D., Zhao, X., Hu, B. & Zhang, M. (2023e). Temporal Knowledge Question Answering via Abstract Reasoning Induction. arXiv preprint. https://arxiv.org/abs/2311.09149
Chen, Y., Sikka, K., Cogswell, M., Ji, H. & Divakaran, A. (2023f). DRESS: Instructing Large Vision- Language Models to Align and Interact with Humans via Natural Language Feedback. arXiv preprint. https://arxiv.org/abs/2311.10081
Cheng, Q., Sun, T., Zhang, W., Wang, S., Liu, X., Zhang, M., He, J., Huang, M., Yin, Z., Chen, K. & Qiu, X. (2023a). Evaluating Hallucinations in Chinese Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.03368
Cheng, D., Huang, S., Bi, J., Zhan, Y., Liu, J., Wang, Y., Sun, H., Wei, F., Deng, D. & Zhang, Q. (2023b). UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. arXiv preprint. https://arxiv.org/abs/2303.08518
Cheng, Q., Sun, T., Liu, X., Zhang, W., Yin, Z., Li, S., Li, L., He, Z., Chen, K. & Qiu, X. (2024). Can AI Assistants Know What They Don't Know?. arXiv preprint. https://arxiv.org/abs/2401.13275
Chiesurin, S., Dimakopoulos, D., Cabezudo, M. A. S., Eshghi, A., Papaioannou, I., Rieser, V. & Konstas, I. (2023). The dangers of trusting stochastic parrots: Faithfulness and trust in open-domain conversational question answering. arXiv preprint. https://arxiv.org/abs/2305.16519
Cho, J., Hu, Y., Garg, R., Anderson, P., Krishna, R., Baldridge, J., Bansal, M., Pont-Tuset, J. & Wang, S. (2023). Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Textto- Image Generation. arXiv preprint. https://arxiv.org/abs/2310.18235
Chrysostomou, G., Zhao, Z., Williams, M. & Aletras, N. (2023). Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. arXiv preprint. https://arxiv.org/abs/2311.09335
Cohen, R., Hamri, M., Geva, M. & Globerson, A. (2023). LM vs LM: Detecting Factual Errors via Cross Examination. arXiv preprint. https://arxiv.org/abs/2305.13281
Cotra, A. (2021). "Why AI Alignment Could Be Hard with Modern Deep Learning." https://www.cold-takes.com/why-ai-alignment-could-behard- with-modern-deep-learning/. (Retrieved on April 27, 2024).
Cui, C., Zhou, Y., Yang, X., Wu, S., Zhang, L., Zou, J. & Yao, H. (2023). Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges. arXiv preprint. https://arxiv.org/abs/2311.03287
Dai, W., Liu, Z., Ji, Z., Su, D. & Fung, P. (2022). Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training. arXiv preprint. https://arxiv.org/abs/2210.07688 https://doi.org/10.07688
Dai, Y., Lang, H., Zeng, K., Huang, F. & Li, Y. (2023). Exploring Large Language Models for Multi- Modal Out-of-Distribution Detection. arXiv preprint. https://arxiv.org/abs/2310.08027 https://doi.org/10.08027
Daull, X., Bellot, P., Bruno, E., Martin, V. & Murisasco, E. (2023). Complex QA and Language Models Hybrid Architectures, Survey. arXiv preprint. https://arxiv.org/abs/2302.09051
Deng, H., Ding, L., Liu, X., Zhang, M., Tao, D. & Zhang, M. (2022). Improving Simultaneous Machine Translation with Monolingual Data. arXiv preprint. https://arxiv.org/abs/2212.01188.
Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A. & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.11495
Ding, Y., Wang, Z., Ahmad, W. U., Ramanathan, M. K., Nallapati, R., Bhatia, P., Roth, D. & Xiang, B. (2022). CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context. arXiv preprint. https://arxiv.org/abs/2212.10007
Dong, G., Yuan, H., Lu, K., Li, C., Xue, M., Liu, D., Wang, W., Yuan, Z., Zhou, C. & Zhou, J. (2023a). How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition. arXiv preprint. https://arxiv.org/abs/2310.05492
Dong, Z., Tang, T., Li, J., Zhao, W. X. & Wen, J. R. (2023b). BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.13345.
Du, L., Wang, Y., Xing, X., Ya, Y., Li, X., Jiang, X. & Fang, X. (2023). Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis. arXiv preprint. https://arxiv.org/abs/2309.05217.
Durante, Z., Huang, Q., Wake, N., Gong, R., Park, J. S., Sarkar, B., Taori, R., Noda, Y., Terzopoulos, D., Choi, Y., Ikeuchi, K., Vo, H., Fei-Fei, L. & Gao, J. (2022). Agent AI: Surveying the Horizons of Multimodal Interaction. arXiv preprint. https://arxiv.org/abs/2401.03568
Elaraby, M. S., Lu, M., Dunn, J., Zhang, X., Wang, Y. & Liu, S. (2023). Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models. arXiv preprint. https://arxiv.org/abs/2308.11764
Eun, J., & Hwang, S. (2020). "An Exploratory Study on Policy Decision Making with Artificial Intelligence: Applying Problem Structuring Typology on Success and Failure Cases." Informatization Policy, 27(4), 47-66.
Fadeeva, E., Vashurin, R., Tsvigun, A., Vazhentsev, A., Petrakov, S., Fedyanin, K., Vasilev, D., Goncharova, E., Panchenko, A., Panov, M., Baldwin, T. & Shelmanov, A. (2023). LM-Polygraph: Uncertainty Estimation for Language Models. arXiv preprint. https://arxiv.org/abs/2311.07383
Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S. & Zhang, J. M. (2023). Large Language Models for Software Engineering: Survey and Open Problems. arXiv preprint. https://arxiv.org/abs/2310.03533
Farinhas, A., de Souza, J. G. C. & Martins, A. F. T. (2023). An Empirical Study of Translation Hypothesis Ensembling with Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.11430
Fei, H., Liu, Q., Zhang, M., Zhang, M. & Chua, T. S. (2023). Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination. arXiv preprint. https://arxiv.org/abs/2305.12256
Feng, H., Fan, Y., Liu, X., Lin, T. E., Yao, Z., Wu, Y., Huang, F., Li, Y. & Ma, Q. (2023). Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs. arXiv preprint. https://arxiv.org/abs/2310.19347
Ferrara, E. (2023). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies." Sci, 6(1), 3.
Foster, D. (2022). Generative deep learning. O'Reilly Media, Inc..
Fung, Y. R., Chakraborty, T., Guo, H., Rambow, O., Muresan, S. & Ji, H. (2022). NormSAGE: Multi- Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly. arXiv preprint. https://arxiv.org/abs/2210.08604
Friel, R. & Sanyal, A. (2023). Chainpoll: A High Efficacy Method for LLM Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.18344 https://doi.org/10.18344
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Guo, Q., Wang, M. & Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint. https://arxiv.org/abs/2312.10997
Ghandi, T., Pourreza, H. & Mahyar, H. (2023). Deep Learning Approaches on Image Captioning: A Review. arXiv preprint. https://arxiv.org/abs/2201.12944. https://doi.org/10.1145/3617592
Gou, Z., Shao, Z., Gong, Y., Shen, Y., Yang, Y., Duan, N. & Chen, W. (2023). CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. arXiv preprint. https://arxiv.org/abs/2305.11738
Guan, J., Dodge, J., Wadden, D., Huang, M. & Peng, H. (2023). Language Models Hallucinate, but May Excel at Fact Verification. arXiv preprint. https://arxiv.org/abs/2310.14564
Guerreiro, N. M., Colombo, P., Piantanida, P. & Martins, A. F. T. (2022). Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation. arXiv preprint. https://arxiv.org/abs/2212.09631
Gu, D., On, B. & Jeong, D. (2022). "Relevance and Redundancy-based Loss Function of KoBART Model for Improvement of the Factual Inconsistency Problem in Abstractive Summarization." The Journal of Korean Institute of Information Technology, 20(12), 25-36.
Gupta, V., Pandya, P., Kataria, T., Gupta, V. & Roth, D. (2023). Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets. arXiv preprint. https://arxiv.org/abs/2311.08662
Ha, D., Dai, A. & Le, Q. V. (2016). Hypernetworks. arXiv preprint. https://arxiv.org/abs/1609.09106
He, Z., Liang, T., Jiao, W., Zhang, Z., Yang, Y., Wang, R., Tu, Z., Shi, S. & Wang, X. (2023). Exploring Human-Like Translation Strategy with Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.04118
Hua, W., Xu, S., Ge, Y. & Zhang, Y. (2023). How to Index Item IDs for Recommendation Foundation Models. arXiv preprint. https://arxiv.org/abs/2305.06569 https://doi.org/10.1145/3624918.3625339
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B. & Liu, T. (2023a). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv preprint. https://arxiv.org/abs/2311.05232
Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W. & Yu, N. (2023b). OPERA: Alleviating Hallucination in Multi- Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation. arXiv preprint. https://arxiv.org/abs/2311.17911
Jafari, M., Sadeghi, D., Shoeibi, A., Alinejad-Rokny, H., Beheshti, A., Garcia, D. L., Chen, Z., Acharya, U. R. & Gorriz, J. M. (2023). Empowering Precision Medicine: AI-Driven Schizophrenia Diagnosis via EEG Signals: A Comprehensive Review from 2002-2023. arXiv preprint. https://arxiv.org/abs/2309.12202
Ji, Z., Liu, Z., Lee, N., Yu, T., Wilie, B., Zeng, M. & Fung, P. (2022). RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding. arXiv preprint. https://arxiv.org/abs/2212.01588
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A. & Fung, P. (2023a). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys, 55(12), 1-38.
Ji, Z., Yu, T., Xu, Y., Lee, N., Ishii, E. & Fung, P. (2023b). Towards Mitigating Hallucination in Large Language Models via Self-Reflection. arXiv preprint. https://arxiv.org/abs/2310.06271 https://doi.org/10.06271
Jiang, C., Xu, H., Dong, M., Chen, J., Ye, W., Yan, M., Ye, Q., Zhang, J., Huang, F. & Zhang, S. (2023). Hallucination Augmented Contrastive Learning for Multimodal Large Language Model. arXiv preprint. https://arxiv.org/abs/2312.06968
Jiao, W., Wang, W., Huang, J., Wang, X., Shi, S. & Tu, Z. (2023). Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv preprint. https://arxiv.org/abs/2301.08745
Jha, S., Jha, S. K., Lincoln, P., Bastian, N. D., Velasquez, A. & Neema, S. (2023). Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting. Paper presented at 2023 IEEE International Conference on Assured Autonomy (ICAA), June 6-8.
Kamalloo, E., Dziri, N., Clarke, C. L. A. & Rafiei, D. (2023). Evaluating Open-Domain Question Answering in the Era of Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.06984
Kanda, N., Yoshioka, T. & Liu, Y. (2023). Factual Consistency Oriented Speech Recognition. arXiv preprint. https://arxiv.org/abs/2302.12369
Kang, C. & Choi, J. (2023). Impact of Co-occurrence on Factual Knowledge of Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.08256
Kang, H. & Liu, X. Y. (2023). Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination. arXiv preprint. https://arxiv.org/abs/2311.15548
Kang, H., Ni, J. & Yao, H. (2023). Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification. arXiv preprint. https://arxiv.org/abs/2311.09114
Kasai, J., Sakaguchi, K., Takahashi, Y., Le Bras, R., Asai, A., Yu, X.V., Radev, D.R., Smith, N.A., Choi, Y. & Inui, K. (2022). RealTime QA: What's the Answer Right Now? arXiv preprint. https://arxiv.org/abs/2207.13332
Kasanishi, T., Isonuma, M., Mori, J. & Sakata, I. (2023). SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation. arXiv preprint. https://arxiv.org/abs/2305.15186
Khalid, H., Tariq, S., Kim, M. & Woo, S. (2021). FakeAVCeleb: A novel audio-video multimodal deepfake dataset. arXiv preprint. https://arxiv.org/abs/2108.05080 108.05080
Koksal, A., Aksitov, R. & Chang, C. (2023). Hallucination Augmented Recitations for Language Models. arXiv preprint. https://arxiv.org/abs/2311.07424
Ladhak, F., Durmus, E., Suzgun, M., Zhang, T., Jurafsky, D., McKeown, K. & Hashimoto, T. (2023). When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3206-3219.
Lattimer, B. M., Chen, P., Zhang, X. & Yang, Y. (2023). Fast and Accurate Factual Inconsistency Detection Over Long Documents. arXiv preprint. https://arxiv.org/abs/2310.13189 https://doi.org/10.13189
Lee, N., Ping, W., Xu, P., Patwary, M., Shoeybi, M. & Catanzaro, B. (2022). Factuality Enhanced Language Models for Open-Ended Text Generation. arXiv preprint. https://arxiv.org/abs/2206.04624
Lee, Z. & Nam, H. (2022). "A Literature Review Study in the Field of Artificial Intelligence (AI) Aplications, AI-Related Management, and AI Application Risk." Informatization Policy, 29(2), 3-36.
Lei, D., Li, Y., Hu, M., Wang, M., Yun, V., Ching, E. & Kamal, E. (2023). Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations. arXiv preprint. https://arxiv.org/abs/2310.03951 https://doi.org/10.03951
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktaschel, T., Riedel, S. & Kiela, D. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems, 33, 9459-9474.
Li, J., Cheng, X., Zhao, W. X., Nie, J. Y. & Wen, J. R. (2023a). HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.11747
Li, Z., Zhang, S., Zhao, H., Yang, Y. & Yang, D. (2023b). BatGPT: A Bidirectional Autoregessive Ta l ke r f ro m G e n e ra t i ve P re-t ra i n e d Transformer. arXiv preprint. https://arxiv.org/abs/2307.00360
Li, K., Patel, O., Vi'egas, F., Pfister, H. & Wattenberg, M. (2023c). Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. arXiv preprint. https://arxiv.org/abs/2306.03341
Li, B., Zhou, B., Wang, F., Fu, X., Roth, D. & Chen, M. (2023d). Deceiving Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?. arXiv preprint. https://arxiv.org/abs/2311.09702
Li, Y., Li, Y., Zhang, M., Su, C., Ren, M., Qiao, X., Zhao, X., Piao, M., Yu, J., Lv, X., Ma, M., Zhao, Y. & Yang, H. (2023e). A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting. arXiv preprint. https://arxiv.org/abs/2309.09552
Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W. X. & Wen, J. R. (2023f). Evaluating Object Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2305.10355 10355
Li, J., Chen, J., Ren, R., Cheng, X., Zhao, W. X., Nie, J. Y. & Wen, J. R. (2024). The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.03205
Lin, S.C., Hilton, J. & Evans, O. (2022). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Annual Meeting of the Association for Computational Linguistics. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 3214-3252.
Liu, W., Li, G., Zhang, K., Du, B., Chen, Q., Hu, X., Xu, H., Chen, J. & Wu, J. (2023a). Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models. arXiv preprint. https://arxiv.org/abs/2311.09214
Liu, B., Ash, J.T., Goel, S., Krishnamurthy, A. & Zhang, C. (2023b). Exposing Attention Glitches with Flip-Flop Language Modeling. arXiv preprint. https://arxiv.org/abs/2306.00946
Liu, G., Wang, X., Yuan, L., Chen, Y. & Peng, H. (2023c). Prudent Silence or Foolish Babble? Examining Large Language Models' Responses to the Unknown. arXiv preprint. https://arxiv.org/abs/2311.09731
Liu, Y., Wang, K., Shao, W., Luo, P., Qiao, Y., Shou, M. Z., Zhang, K. & You, Y. (2023d). MLLMs- Augmented Visual-Language Representation Learning. arXiv preprint. https://arxiv.org/abs/2311.18765
Lovenia, H., Dai, W., Cahyawijaya, S., Ji, Z. & Fung, P. (2023). Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2310.05338 https://doi.org/10.05338
Luo, J., Xiao, C. & Ma, F. (2023). Zero-Resource Hallucination Prevention for Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.02654
Luo, J., Li, T., Wu, D., Jenkin, M., Liu, S. & Dudek, G. (2024). Hallucination Detection and Hallucination Mitigation: An Investigation. arXiv preprint. https://arxiv.org/abs/2401.08358
Ma, W., Liu, S., Wang, W., Hu, Q., Liu, Y., Zhang, C., Nie, L. & Liu, Y. (2023). ChatGPT: Understanding Code Syntax and Semantics. arXiv preprint. https://arxiv.org/abs/2305.12138
Manakul, P., Liusie, A. & Gales, M.J. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. arXiv preprint. https://arxiv.org/abs/2303.08896
Mei, K. & Zhang, Y. (2023). LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation. arXiv preprint. https://arxiv.org/abs/2310.17488.
Meng, K., Bau, D., Andonian, A. & Belinkov, Y. (2022). "Locating and Editing Factual Associations in GPT." Advances in Neural Information Processing Systems, 35, 17359-17372.
Miao, M., Meng, F., Liu, Y., Zhou, X. H. & Zhou, J. (2021). Prevent the Language Model from Being Overconfident in Neural Machine Translation. arXiv preprint. https://arxiv.org/abs/2105.11098 105.11098
Miao, N., Teh, Y.W. & Rainforth, T. (2023). SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning. arXiv preprint. https://arxiv.org/abs/2308.00436
Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W., Koh, P., Iyyer, M., Zettlemoyer, L. & Hajishirzi, H. (2023). FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. arXiv preprint. https://arxiv.org/abs/2305.14251
Mitchell, E., Lin, C., Bosselut, A., Finn, C. & Manning, C.D. (2022, April 25-29). Fast Model Editing at Scale [Poster Presentation]. The Tenth International Conference on Learning Representations, ICLR 2022 Virtual Event. https://openreview.net/forum?id=0DcZxeWfOPt
Mohammadshahi, A., Vamvas, J. & Sennrich, R. (2023). Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models. arXiv preprint. https://arxiv.org/abs/2311.07439
Montagnese, M., Leptourgos, P., Fernyhough, C., Waters, F., Laroi, F., Jardri, R., McCarthy-Jones, S., Thomas, N., Dudley, R., Taylor, J.-P., Collerton, D. & Urwyler, P. (2021). "A review of multimodal hallucinations: categorization, assessment, theoretical perspectives, and clinical recommendations." Schizophrenia Bulletin, 47(1), 237-248.
Moses, L. (2024). "OpenAI's Sora's Best Features and Biggest Limitations." Business Insider, April 19.
Muhlgay, D., Ram, O., Magar, I., Levine, Y., Ratner, N., Belinkov, Y., Abend, O., Leyton-Brown, K., Shashua, A. & Shoham, Y. (2023). Generating Benchmarks for Factuality Evaluation of Language Models. arXiv preprint. https://arxiv.org/abs/2307.06908
Nathani, D., Wang, D., Pan, L. & Wang, W. Y. (2023). MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.12426
Oh, D. (2024). "Research Trends for Dehallucination of Natural Language Generation Model." Communications of the Korean Institute of Information Scientists and Engineers, 42(1), 15-20.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J. & Lowe, R. (2022). "Training Language Models to Follow Instructions with Human Feedback." Advances in neural information processing systems, 35, 27730-27744.
Pan, L., Saxon, M., Xu, W., Nathani, D., Wang, X. & Wang, W. Y. (2023). Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Self-Correction Strategies. arXiv preprint. https://arxiv.org/abs/2308.03188
Park, D. (2024). Media Artificial Intelligence. Seoul: Yulgokbook Publishing Company.
Park, D. (2023a). "Journalism Artificial Intelligence Based on Trustworthy Artificial Intelligence : Toward a Commensurability between Media Trust and Trustworthiness of Artificial Intelligence System." Media & Society, 31(4), 5-47.
Park, D. (2023b). "Topology of Media Bias : Fat-Tailed Distribution as Universal Distribution of Quotation by Analyzing News Source Networks with 16.5 Million Articles." Korean Journal of Journalism & Communication Studies, 67(6), 189-222.
Peng, B., Galley, M., He, P., Cheng, H., Xie, Y., Hu, Y., Huang, Q., Liden, L., Yu, Z., Chen, W. & Gao, J. (2023). Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. arXiv preprint. https://arxiv.org/abs/2302.12813
Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A. & Lewis, M. (2022). Measuring and Narrowing the Compositionality Gap in Language Models. arXiv preprint. https://arxiv.org/abs/2210.03350
Qiu, Z., Liu, W., Xiao, T. Z., Liu, Z., Bhatt, U., Luo, Y., Weller, A. & Scholkopf, B. (2022). Iterative Teaching by Data Hallucination. arXiv preprint. https://arxiv.org/abs/2210.17467
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). "Language Models are Unsupervised Multitask Learners." https://openai.com/research/better-languagemodels (Retrieved on April 27, 2024).
Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K. & Shoham, Y. (2023). In-Context Retrieval-Augmented Language Models. arXiv preprint. https://arxiv.org/abs/2302.00083
Rawte, V., Chakraborty, S., Pathak, A., Sarkar, A., Tonmoy, S. M. T. I., Chadha, A., Sheth, A. P. & Das, A. (2023a). The Troubling Emergence of Hallucination in Large Language Models - An Extensive Definition, Quantification, and Prescriptive Remediations. arXiv preprint. https://arxiv.org/abs/2310.04988
Rawte, V., Sheth, A. & Das, A. (2023b). A Survey of Hallucination in Large Foundation Models. arXiv preprint. https://arxiv.org/abs/2309.05922
Rehman, T., Mandal, R., Agarwal, A. & Sanyal, D. K. (2023). Hallucination Reduction in Long Input Text Summarization. arXiv preprint. https://arxiv.org/abs/2309.16781
Rejeleene, R., Xu, X. & Talburt, J. (2024). Towards Trustable Language Models: Investigating Information Quality of Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.13086
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684-10695.
Saha, S., Yu, X. V., Bansal, M., Pasunuru, R. & Celikyilmaz, A. (2022). MURMUR: Modular Multi-Step Reasoning for Semi-Structured Datato-Text Generation. arXiv preprint. https://arxiv.org/abs/2212.08607
Sarkar, D., Bali, R. & Ghosh, T. (2018). Hands-On Transfer Learning with Python: Implement Advanced Deep Learning and Neural Network Models using TensorFlow and Keras. Packt Publishing.
Saunders, W., Yeh, C., Wu, J., Bills, S., Long, O., Ward, J. & Leike, J. (2022). Self-Critiquing Models for Assisting Human Evaluators. arXiv preprint. https://arxiv.org/abs/2206.05802
Schulman, J. (2023). "Reinforcement Learning from Human Fheedback: Progress and Challenges." https://www.youtube.com/watch?v=hhiLw5Q_UFg. (Retrieved on April 27, 2024).
Shi, W., Han, X., Lewis, M., Tsvetkov, Y., Zettlemoyer, L. & Yih, S. W. T. (2023). Trusting Your Evidence: Hallucinate Less with Context-Aware Decoding. arXiv preprint. https://arxiv.org/abs/2305.14739
Shi, Z., Wang, Z., Fan, H., Yin, Z., Sheng, L., Qiao, Y. & Shao, J. (2023). ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models. arXiv preprint. https://arxiv.org/abs/2311.02692
Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J. L. & Wang, L. (2022). Prompting GPT-3 To Be Reliable. arXiv preprint. https://arxiv.org/abs/2210.09150
Song, M. & Lee, S. (2024). "What Concerns Does ChatGPT Raise for Us?: An Analysis Centered on CTM (Correlated Topic Modeling) of YouTube Video News Comments." Informatization Policy, 31(1), 3-31.
Sun, Z., Shen, Y., Zhou, Q., Zhang, H., Chen, Z., Cox, D.D., Yang, Y. & Gan, C. (2023a). Principle- Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. arXiv preprint. https://arxiv.org/abs/2305.03047
Sun, Q., Yin, Z., Li, X., Wu, Z., Qiu, X. & Kong, L. (2023b). Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration. arXiv preprint. https://arxiv.org/abs/2310.00280
Tam, D., Mascarenhas, A., Zhang, S., Kwan, S., Bansal, M. & Raffel, C. (2022). Evaluating the Factual Consistency of Large Language Models Through News Summarization. arXiv preprint. https://arxiv.org/abs/2211.08412
Tian, K., Mitchell, E., Yao, H., Manning, C. D. & Finn, C. (2023). Fine-tuning Language Models for Factuality. arXiv preprint. https://arxiv.org/abs/2311.08401
Tonmoy, S. M. T. I., Zaman, S. M. M., Jain, V., Rani, A., Rawte, V., Chadha, A. & Das, A. (2024). A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.01313
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Roziere, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E. & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv preprint. https://arxiv.org/abs/2302.13971
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). "Attention is All You Need." Advances in neural information processing systems, 30.
Verma, S., Goel, T., Tanveer, M., Ding, W. & Sharma, R. (2024). Machine learning techniques for the Schizophrenia diagnosis: A comprehensive review and future research directions. arXiv preprint. https://arxiv.org/abs/2301.07496
Vu, T., Iyyer, M., Wang, X., Constant, N., Wei, J., Wei, J., Tar, C., Sung, Y., Zhou, D., Le, Q. & Luong, T. (2023). FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation. arXiv preprint. https://arxiv.org/abs/2310.03214
Wan, D., Zhang, S. & Bansal, M. (2023). HistAlign: Improving Context Dependency in Language Generation by Aligning with History. arXiv preprint. https://arxiv.org/abs/2305.04782
Wang, Y., Zhong, W., Li, L., Mi, F., Zeng, X., Huang, W., Shang, L., Jiang, X. & Liu, Q. (2023a). Aligning Large Language Models with Human: A Survey. arXiv preprint. https://arxiv.org/abs/2307.12966
Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F. & Ji, H. (2023b). Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration. arXiv preprint. https://arxiv.org/abs/2307.05300
Wang, J., Wang, Y., Xu, G., Zhang, J., Gu, Y., Jia, H., Yan, M., Zhang, J. & Sang, J. (2023c). An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation. arXiv preprint. https://arxiv.org/abs/2311.07397
Wang, B., Wu, F., Han, X., Peng, J., Zhong, H., Zhang, P., Dong, X., Li, W., Li, W., Wang, J. & He, C. (2023d). VIGC: Visual Instruction Generation and Correction. arXiv preprint. https://arxiv.org/abs/2308.12714
Wang, J., Zhou, Y., Xu, G., Shi, P., Zhao, C., Xu, H., Ye, Q., Yan, M., Zhang, J., Zhu, J. & Sang, J. (2023e). Evaluation and Analysis of Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2308.15126
Wang, F. (2024). LightHouse: A Survey of AGI Hallucination. arXiv preprint. https://arxiv.org/abs/2401.06792
Wang, J., Chang, Y., Li, Z., An, N., Ma, Q., Hei, L., Luo, H., Lu, Y. & Ren, F. (2024a). TechGPT-2.0: A large language model project to solve the task of knowledge graph construction. arXiv preprint. https://arxiv.org/abs/2401.04507
Wang, X., Zhou, Y., Liu, X., Lu, H., Xu, Y., He, F., Yoon, J., Lu, T., Bertasius, G., Bansal, M., Yao, H. & Huang, F. (2024b). Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences. arXiv preprint. https://arxiv.org/abs/2401.10529
Wei, J. W., Huang, D., Lu, Y., Zhou, D. & Le, Q. (2023). Simple Synthetic Data Reduces Sycophancy in Large Language Models. arXiv preprint. https://arxiv.org/abs/2308.03958
Wilie, B., Xu, Y., Chung, W., Cahyawijaya, S., Lovenia, H. & Fung, P. (2023). PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems. arXiv preprint. https://arxiv.org/abs/2309.10413
Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J. & Hooi, B. (2023). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. arXiv preprint. https://arxiv.org/abs/2306.13063
Xu, P., Shao, W., Zhang, K., Gao, P., Liu, S., Lei, M., Meng, F., Huang, S., Qiao, Y. & Luo, P. (2023). LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2306.09265
Xue, T., Wang, Z., Wang, Z., Han, C., Yu, P. & Ji, H. (2023). RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chainof- Thought. arXiv preprint. https://arxiv.org/abs/2305.11499
Yang, Z., Dai, Z., Salakhutdinov, R. & Cohen, W.W. (2018). Breaking the Softmax Bottleneck: A High-Rank RNN Language Model. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, April 30 - May 3, 2018.
Yang, S., Sun, R. & Wan, X. (2023a). A New Benchmark and Reverse Validation Method for Passagelevel Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.06498 https://doi.org/10.06498
Yang, L., Zhang, S., Yu, Z., Bao, G., Wang, Y., Wang, J., Xu, R., Ye, W., Xie, X., Chen, W. & Zhang, Y. (2023c). Supervised Knowledge Makes Large Language Models Better In-context Learners. arXiv preprint. https://arxiv.org/abs/2312.15918
Yao, J., Ning, K., Liu, Z., Ning, M. & Yuan, L. (2023a). LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples. arXiv preprint. https://arxiv.org/abs/2310.01469 https://doi.org/10.01469
Yao, Y., Xu, X. & Liu, Y. (2023b). Large Language Model Unlearning. arXiv preprint. https://arxiv.org/abs/2310.10683
Ye, H., Liu, T., Zhang, A., Hua, W. & Jia, W. (2023a). Cognitive Mirage: A Review of Hallucinations in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.06794
Yu, X., Cheng, H., Liu, X., Roth, D. & Gao, J. (2023). Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks. arXiv preprint. https://arxiv.org/abs/2310.12516 https://doi.org/10.12516
Yun, H. S., Marshall, I. J., Trikalinos, T. A. & Wallace, B. C. (2023). Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews. arXiv preprint. https://arxiv.org/abs/2305.11828
Zha, Y., Yang, Y., Li, R. & Hu, Z. (2023). AlignScore: Evaluating Factual Consistency with A Unified Alignment Function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 11328-11348.
Zhai, Y., Tong, S., Li, X., Cai, M., Qu, Q., Lee, Y. J. & Ma, Y. (2023). Investigating the Catastrophic Forgetting in Multimodal Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.10313 10313
Zhang, H., Duckworth, D., Ippolito, D. & Neelakantan, A. (2020). Trading off diversity and quality in natural language generation. arXiv preprint. https://arxiv.org/abs/2004.10450
Zhang, J., Li, Z., Das, K., Malin, B. & Sricharan, K. (2023c). SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semanticaware Cross-check Consistency. arXiv preprint. https://arxiv.org/abs/2311.01740
Zhang, M., Press, O., Merrill, W., Liu, A. & Smith, N. A. (2023b). How Language Model Hallucinations Can Snowball. arXiv preprint. https://arxiv.org/abs/2305.13534
Zhang, S., Pan, L., Zhao, J. & Wang, W.Y. (2023d). The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.13669
Zhang, T., Qiu, L., Guo, Q., Deng, C., Zhang, Y., Zhang, Z., Zhou, C., Wang, X. & Fu, L. (2023e). Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus. arXiv preprint. https://arxiv.org/abs/2311.13230
Zhang, Y., Cui, L., Bi, W. & Shi, S. (2023f). Alleviating Hallucinations of Large Language Models through Induced Hallucinations. arXiv preprint. https://arxiv.org/abs/2312.15710
Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, A.T., Bi, W., Shi, F. & Shi, S. (2023a). Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.01219
Zhao, R., Li, X., Joty, S.R., Qin, C. & Bing, L. (2023a). Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 5823-5840.
Zhao, Z., Wang, B., Ouyang, L., Dong, X., Wang, J. & He, C. (2023b). Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization. arXiv preprint. https://arxiv.org/abs/2311.16839
Zhong, Z., Wu, Z., Manning, C. D., Potts, C. & Chen, D. (2023). Mquake: Assessing Knowledge Editing in Language Models via Multi-Hop Questions . arXiv preprint. https://arxiv.org/abs/2305.14795
Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P., Yu, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L. & Levy, O. (2023a). LIMA: Less Is More for Alignment. arXiv preprint. https://arxiv.org/abs/2305.11206
Zhou, Y., Cui, C., Yoon, J., Zhang, L., Deng, Z., Finn, C., Bansal, M. & Yao, H. (2023b). Analyzing and Mitigating Object Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2310.00754 https://doi.org/10.00754
Zhu, J., Qi, J., Ding, M., Chen, X., Luo, P., Wang, X., Liu, W., Wang, L. & Wang, J. (2023). Understanding Self-Supervised Pretraining with Part-Aware Representation Learning. arXiv preprint. https://arxiv.org/abs/2301.11915
Zong, M. & Krishnamachari, B. (2022). A Survey on GPT-3. arXiv preprint. https://arxiv.org/abs/2212.00857

Informatization Policy (정보화정책)

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv

챗GPT 등장 이후 인공지능 환각 연구의 문헌 검토: 아카이브(arXiv)의 논문을 중심으로

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)