DOI QR코드

DOI QR Code

Literature Review of AI Hallucination Research Since the Advent of ChatGPT: Focusing on Papers from arXiv

챗GPT 등장 이후 인공지능 환각 연구의 문헌 검토: 아카이브(arXiv)의 논문을 중심으로

  • Park, Dae-Min (School of Media & Communication, Sunmoon University) ;
  • Lee, Han-Jong (Graduate School of Communication, Seoul National University)
  • 박대민 ;
  • 이한종
  • Received : 2024.04.30
  • Accepted : 2024.05.20
  • Published : 2024.06.30

Abstract

Hallucination is a significant barrier to the utilization of large-scale language models or multimodal models. In this study, we collected 654 computer science papers with "hallucination" in the abstract from arXiv from December 2022 to January 2024 following the advent of Chat GPT and conducted frequency analysis, knowledge network analysis, and literature review to explore the latest trends in hallucination research. The results showed that research in the fields of "Computation and Language," "Artificial Intelligence," "Computer Vision and Pattern Recognition," and "Machine Learning" were active. We then analyzed the research trends in the four major fields by focusing on the main authors and dividing them into data, hallucination detection, and hallucination mitigation. The main research trends included hallucination mitigation through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), inference enhancement via "chain of thought" (CoT), and growing interest in hallucination mitigation within the domain of multimodal AI. This study provides insights into the latest developments in hallucination research through a technology-oriented literature review. This study is expected to help subsequent research in both engineering and humanities and social sciences fields by understanding the latest trends in hallucination research.

환각은 대형언어모형이나 대형 멀티모달 모형의 활용을 막는 큰 장벽이다. 본 연구에서는 최신 환각 연구 동향을 살펴보기 위해 챗 GPT 등장 이후인 2022년 12월부터 2024년 1월까지 아카이브(arXiv)에서 초록에 '환각'이 포함된 컴퓨터과학 분야 논문 654건을 수집해 빈도분석, 지식연결망 분석, 문헌 검토를 수행했다. 이를 통해 분야별 주요 저자, 주요 키워드, 주요 분야, 분야 간 관계를 분석했다. 분석 결과 '계산 및 언어'와 '인공지능', '컴퓨터비전 및 패턴인식', '기계학습' 분야의 연구가 활발했다. 이어 4개 주요 분야 연구 동향을 주요 저자를 중심으로 데이터 측면, 환각 탐지 측면, 환각 완화 측면으로 나눠 살펴보았다. 주요 연구 동향으로는 지도식 미세조정(SFT)과 인간 피드백 기반 강화학습(RLHF)을 통한 환각 완화, 생각의 체인(CoT) 등 추론 강화, 자동화와 인간 개입의 병행, 멀티모달 AI의 환각 완화에 대한 관심 증가 등을 들 수 있다. 본 연구는 환각 연구 최신 동향을 파악함으로써 공학계는 물론 인문사회계 후속 연구의 토대가 될 것으로 기대한다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(No. 2022R1A5A7083908).

References

  1. Abbas, A., Tirumala, K., Simig, D., Ganguli, S. & Morcos, A.S. (2023). SemDeDup: Data-efficient learning at web-scale through semantic deduplication. arXiv preprint. https://arxiv.org/abs/2303.09540
  2. Agrawal, G., Kumarage, T., Alghami, Z. & Liu, H. (2023). Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey. arXiv preprint. https://arxiv.org/abs/2311.07914
  3. Amayuelas, A., Pan, L., Chen, W. & Wang, W. (2023). Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.13712
  4. Andriopoulos, K. & Pouwelse, J. (2023). Augmenting LLMs with Knowledge: A survey on hallucination prevention. arXiv preprint. https://arxiv.org/abs/2309.16459
  5. Augenstein, I., Baldwin, T., Cha, M., Chakraborty, T., Ciampaglia, G. L., Corney, D., DiResta, R., Ferrara, E., Hale, S., Halevy, A., Hovy, E., Ji, H., Menczer, F., Miguez, R., Nakov, P., Scheufele, D., Sharma, S. & Zagni, G. (2023). Factuality Challenges in the Era of Large Language Models. arXiv preprint. https://arxiv.org/abs/231005189.
  6. Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kadavath, S., Kernion, J., Conerly, T., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Hume, T., Johnston, S., Kravec, S., Lovitt, L., Nanda, N., Olsson, C., Amodei, D., Brown, T.B., Clark, J., McCandlish, S., Olah, C., Mann, B. & Kaplan, J. (2022). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv preprint. https://arxiv.org/abs/2204.05862
  7. Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q. V., Xu, Y. & Fung, P. (2023). A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv preprint. https://arxiv.org/abs/2302.04023
  8. Barrett, C., Boyd, B., Bursztein, E., Carlini, N., Chen, B., Choi, J., Chowdhury, A., Christodorescu, M., Datta, A., Feizi, S., Fisher, K., Hashimoto, T., Hendrycks, D., Jha, S., Kang, D., Kerschbaum, F., Mitchell, E., Mitchell, J., Ramzan, Z., Shams, K., Song, D., Taly, A. & Yang, D. (2023). "Identifying and mitigating the security risks of generative ai." Foundations and Trends® in Privacy and Security, 6(1), 1-52.
  9. Beltagy, I., Peters, M.E. & Cohan, A. (2020). Longformer: The Long-Document Transformer. arXiv preprint. https://arxiv.org/abs/2004.05150
  10. Berglund, L., Tong, M., Kaufmann, M., Balesni, M., Stickland, A. C., Korbak, T. & Evans, O. (2023). The Reversal Curse: LLMs trained on" A is B" fail to learn" B is A". arXiv preprint. https://arxiv.org/abs/2309.12288
  11. Chen, X., Li, M., Gao, X. & Zhang, X. (2022). "Towards improving faithfulness in abstractive summarization." Advances in Neural Information Processing Systems, 35, 24516-24528.
  12. Chen, Y., Liu, Y., Meng, F., Chen, Y., Xu, J. & Zhou, J. (2023a). Improving Translation Faithfulness of Large Language Models via Augmenting Instructions. arXiv preprint. https://arxiv.org/abs/2308.12674
  13. Chen, S., Zhao, Y., Zhang, J., Chern, E., Gao, S., Liu, P. & He, J. (2023b). FELM: Benchmarking Factuality Evaluation of Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.00741
  14. Chen, B., Zhang, Z., Langrene, N. & Zhu, S. (2023c). Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review. arXiv preprint. https://arxiv.org/abs/2310.14735
  15. Chen, X., Song, D., Gui, H., Wang, C., Zhang, N., Yong, J., Huang, F., Lv, C., Zhang, D. & Chen, H. (2023d). FactCHD: Benchmarking Fact-Conflicting Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.12086
  16. Chen, Z., Li, D., Zhao, X., Hu, B. & Zhang, M. (2023e). Temporal Knowledge Question Answering via Abstract Reasoning Induction. arXiv preprint. https://arxiv.org/abs/2311.09149
  17. Chen, Y., Sikka, K., Cogswell, M., Ji, H. & Divakaran, A. (2023f). DRESS: Instructing Large Vision- Language Models to Align and Interact with Humans via Natural Language Feedback. arXiv preprint. https://arxiv.org/abs/2311.10081
  18. Cheng, Q., Sun, T., Zhang, W., Wang, S., Liu, X., Zhang, M., He, J., Huang, M., Yin, Z., Chen, K. & Qiu, X. (2023a). Evaluating Hallucinations in Chinese Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.03368
  19. Cheng, D., Huang, S., Bi, J., Zhan, Y., Liu, J., Wang, Y., Sun, H., Wei, F., Deng, D. & Zhang, Q. (2023b). UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. arXiv preprint. https://arxiv.org/abs/2303.08518
  20. Cheng, Q., Sun, T., Liu, X., Zhang, W., Yin, Z., Li, S., Li, L., He, Z., Chen, K. & Qiu, X. (2024). Can AI Assistants Know What They Don't Know?. arXiv preprint. https://arxiv.org/abs/2401.13275
  21. Chiesurin, S., Dimakopoulos, D., Cabezudo, M. A. S., Eshghi, A., Papaioannou, I., Rieser, V. & Konstas, I. (2023). The dangers of trusting stochastic parrots: Faithfulness and trust in open-domain conversational question answering. arXiv preprint. https://arxiv.org/abs/2305.16519
  22. Cho, J., Hu, Y., Garg, R., Anderson, P., Krishna, R., Baldridge, J., Bansal, M., Pont-Tuset, J. & Wang, S. (2023). Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Textto- Image Generation. arXiv preprint. https://arxiv.org/abs/2310.18235
  23. Chrysostomou, G., Zhao, Z., Williams, M. & Aletras, N. (2023). Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. arXiv preprint. https://arxiv.org/abs/2311.09335
  24. Cohen, R., Hamri, M., Geva, M. & Globerson, A. (2023). LM vs LM: Detecting Factual Errors via Cross Examination. arXiv preprint. https://arxiv.org/abs/2305.13281
  25. Cotra, A. (2021). "Why AI Alignment Could Be Hard with Modern Deep Learning." https://www.cold-takes.com/why-ai-alignment-could-behard- with-modern-deep-learning/. (Retrieved on April 27, 2024).
  26. Cui, C., Zhou, Y., Yang, X., Wu, S., Zhang, L., Zou, J. & Yao, H. (2023). Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges. arXiv preprint. https://arxiv.org/abs/2311.03287
  27. Dai, W., Liu, Z., Ji, Z., Su, D. & Fung, P. (2022). Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training. arXiv preprint. https://arxiv.org/abs/2210.07688 https://doi.org/10.07688
  28. Dai, Y., Lang, H., Zeng, K., Huang, F. & Li, Y. (2023). Exploring Large Language Models for Multi- Modal Out-of-Distribution Detection. arXiv preprint. https://arxiv.org/abs/2310.08027 https://doi.org/10.08027
  29. Daull, X., Bellot, P., Bruno, E., Martin, V. & Murisasco, E. (2023). Complex QA and Language Models Hybrid Architectures, Survey. arXiv preprint. https://arxiv.org/abs/2302.09051
  30. Deng, H., Ding, L., Liu, X., Zhang, M., Tao, D. & Zhang, M. (2022). Improving Simultaneous Machine Translation with Monolingual Data. arXiv preprint. https://arxiv.org/abs/2212.01188.
  31. Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A. & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.11495
  32. Ding, Y., Wang, Z., Ahmad, W. U., Ramanathan, M. K., Nallapati, R., Bhatia, P., Roth, D. & Xiang, B. (2022). CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context. arXiv preprint. https://arxiv.org/abs/2212.10007
  33. Dong, G., Yuan, H., Lu, K., Li, C., Xue, M., Liu, D., Wang, W., Yuan, Z., Zhou, C. & Zhou, J. (2023a). How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition. arXiv preprint. https://arxiv.org/abs/2310.05492
  34. Dong, Z., Tang, T., Li, J., Zhao, W. X. & Wen, J. R. (2023b). BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.13345.
  35. Du, L., Wang, Y., Xing, X., Ya, Y., Li, X., Jiang, X. & Fang, X. (2023). Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis. arXiv preprint. https://arxiv.org/abs/2309.05217.
  36. Durante, Z., Huang, Q., Wake, N., Gong, R., Park, J. S., Sarkar, B., Taori, R., Noda, Y., Terzopoulos, D., Choi, Y., Ikeuchi, K., Vo, H., Fei-Fei, L. & Gao, J. (2022). Agent AI: Surveying the Horizons of Multimodal Interaction. arXiv preprint. https://arxiv.org/abs/2401.03568
  37. Elaraby, M. S., Lu, M., Dunn, J., Zhang, X., Wang, Y. & Liu, S. (2023). Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models. arXiv preprint. https://arxiv.org/abs/2308.11764
  38. Eun, J., & Hwang, S. (2020). "An Exploratory Study on Policy Decision Making with Artificial Intelligence: Applying Problem Structuring Typology on Success and Failure Cases." Informatization Policy, 27(4), 47-66.
  39. Fadeeva, E., Vashurin, R., Tsvigun, A., Vazhentsev, A., Petrakov, S., Fedyanin, K., Vasilev, D., Goncharova, E., Panchenko, A., Panov, M., Baldwin, T. & Shelmanov, A. (2023). LM-Polygraph: Uncertainty Estimation for Language Models. arXiv preprint. https://arxiv.org/abs/2311.07383
  40. Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S. & Zhang, J. M. (2023). Large Language Models for Software Engineering: Survey and Open Problems. arXiv preprint. https://arxiv.org/abs/2310.03533
  41. Farinhas, A., de Souza, J. G. C. & Martins, A. F. T. (2023). An Empirical Study of Translation Hypothesis Ensembling with Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.11430
  42. Fei, H., Liu, Q., Zhang, M., Zhang, M. & Chua, T. S. (2023). Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination. arXiv preprint. https://arxiv.org/abs/2305.12256
  43. Feng, H., Fan, Y., Liu, X., Lin, T. E., Yao, Z., Wu, Y., Huang, F., Li, Y. & Ma, Q. (2023). Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs. arXiv preprint. https://arxiv.org/abs/2310.19347
  44. Ferrara, E. (2023). "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies." Sci, 6(1), 3.
  45. Foster, D. (2022). Generative deep learning. O'Reilly Media, Inc..
  46. Fung, Y. R., Chakraborty, T., Guo, H., Rambow, O., Muresan, S. & Ji, H. (2022). NormSAGE: Multi- Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly. arXiv preprint. https://arxiv.org/abs/2210.08604
  47. Friel, R. & Sanyal, A. (2023). Chainpoll: A High Efficacy Method for LLM Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.18344 https://doi.org/10.18344
  48. Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Guo, Q., Wang, M. & Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv preprint. https://arxiv.org/abs/2312.10997
  49. Ghandi, T., Pourreza, H. & Mahyar, H. (2023). Deep Learning Approaches on Image Captioning: A Review. arXiv preprint. https://arxiv.org/abs/2201.12944. https://doi.org/10.1145/3617592
  50. Gou, Z., Shao, Z., Gong, Y., Shen, Y., Yang, Y., Duan, N. & Chen, W. (2023). CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. arXiv preprint. https://arxiv.org/abs/2305.11738
  51. Guan, J., Dodge, J., Wadden, D., Huang, M. & Peng, H. (2023). Language Models Hallucinate, but May Excel at Fact Verification. arXiv preprint. https://arxiv.org/abs/2310.14564
  52. Guerreiro, N. M., Colombo, P., Piantanida, P. & Martins, A. F. T. (2022). Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation. arXiv preprint. https://arxiv.org/abs/2212.09631
  53. Gu, D., On, B. & Jeong, D. (2022). "Relevance and Redundancy-based Loss Function of KoBART Model for Improvement of the Factual Inconsistency Problem in Abstractive Summarization." The Journal of Korean Institute of Information Technology, 20(12), 25-36.
  54. Gupta, V., Pandya, P., Kataria, T., Gupta, V. & Roth, D. (2023). Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets. arXiv preprint. https://arxiv.org/abs/2311.08662
  55. Ha, D., Dai, A. & Le, Q. V. (2016). Hypernetworks. arXiv preprint. https://arxiv.org/abs/1609.09106
  56. He, Z., Liang, T., Jiao, W., Zhang, Z., Yang, Y., Wang, R., Tu, Z., Shi, S. & Wang, X. (2023). Exploring Human-Like Translation Strategy with Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.04118
  57. Hua, W., Xu, S., Ge, Y. & Zhang, Y. (2023). How to Index Item IDs for Recommendation Foundation Models. arXiv preprint. https://arxiv.org/abs/2305.06569 https://doi.org/10.1145/3624918.3625339
  58. Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B. & Liu, T. (2023a). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv preprint. https://arxiv.org/abs/2311.05232
  59. Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W. & Yu, N. (2023b). OPERA: Alleviating Hallucination in Multi- Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation. arXiv preprint. https://arxiv.org/abs/2311.17911
  60. Jafari, M., Sadeghi, D., Shoeibi, A., Alinejad-Rokny, H., Beheshti, A., Garcia, D. L., Chen, Z., Acharya, U. R. & Gorriz, J. M. (2023). Empowering Precision Medicine: AI-Driven Schizophrenia Diagnosis via EEG Signals: A Comprehensive Review from 2002-2023. arXiv preprint. https://arxiv.org/abs/2309.12202
  61. Ji, Z., Liu, Z., Lee, N., Yu, T., Wilie, B., Zeng, M. & Fung, P. (2022). RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding. arXiv preprint. https://arxiv.org/abs/2212.01588
  62. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A. & Fung, P. (2023a). "Survey of Hallucination in Natural Language Generation." ACM Computing Surveys, 55(12), 1-38.
  63. Ji, Z., Yu, T., Xu, Y., Lee, N., Ishii, E. & Fung, P. (2023b). Towards Mitigating Hallucination in Large Language Models via Self-Reflection. arXiv preprint. https://arxiv.org/abs/2310.06271 https://doi.org/10.06271
  64. Jiang, C., Xu, H., Dong, M., Chen, J., Ye, W., Yan, M., Ye, Q., Zhang, J., Huang, F. & Zhang, S. (2023). Hallucination Augmented Contrastive Learning for Multimodal Large Language Model. arXiv preprint. https://arxiv.org/abs/2312.06968
  65. Jiao, W., Wang, W., Huang, J., Wang, X., Shi, S. & Tu, Z. (2023). Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv preprint. https://arxiv.org/abs/2301.08745
  66. Jha, S., Jha, S. K., Lincoln, P., Bastian, N. D., Velasquez, A. & Neema, S. (2023). Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting. Paper presented at 2023 IEEE International Conference on Assured Autonomy (ICAA), June 6-8.
  67. Kamalloo, E., Dziri, N., Clarke, C. L. A. & Rafiei, D. (2023). Evaluating Open-Domain Question Answering in the Era of Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.06984
  68. Kanda, N., Yoshioka, T. & Liu, Y. (2023). Factual Consistency Oriented Speech Recognition. arXiv preprint. https://arxiv.org/abs/2302.12369
  69. Kang, C. & Choi, J. (2023). Impact of Co-occurrence on Factual Knowledge of Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.08256
  70. Kang, H. & Liu, X. Y. (2023). Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination. arXiv preprint. https://arxiv.org/abs/2311.15548
  71. Kang, H., Ni, J. & Yao, H. (2023). Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification. arXiv preprint. https://arxiv.org/abs/2311.09114
  72. Kasai, J., Sakaguchi, K., Takahashi, Y., Le Bras, R., Asai, A., Yu, X.V., Radev, D.R., Smith, N.A., Choi, Y. & Inui, K. (2022). RealTime QA: What's the Answer Right Now? arXiv preprint. https://arxiv.org/abs/2207.13332
  73. Kasanishi, T., Isonuma, M., Mori, J. & Sakata, I. (2023). SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation. arXiv preprint. https://arxiv.org/abs/2305.15186
  74. Khalid, H., Tariq, S., Kim, M. & Woo, S. (2021). FakeAVCeleb: A novel audio-video multimodal deepfake dataset. arXiv preprint. https://arxiv.org/abs/2108.05080 108.05080
  75. Koksal, A., Aksitov, R. & Chang, C. (2023). Hallucination Augmented Recitations for Language Models. arXiv preprint. https://arxiv.org/abs/2311.07424
  76. Ladhak, F., Durmus, E., Suzgun, M., Zhang, T., Jurafsky, D., McKeown, K. & Hashimoto, T. (2023). When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3206-3219.
  77. Lattimer, B. M., Chen, P., Zhang, X. & Yang, Y. (2023). Fast and Accurate Factual Inconsistency Detection Over Long Documents. arXiv preprint. https://arxiv.org/abs/2310.13189 https://doi.org/10.13189
  78. Lee, N., Ping, W., Xu, P., Patwary, M., Shoeybi, M. & Catanzaro, B. (2022). Factuality Enhanced Language Models for Open-Ended Text Generation. arXiv preprint. https://arxiv.org/abs/2206.04624
  79. Lee, Z. & Nam, H. (2022). "A Literature Review Study in the Field of Artificial Intelligence (AI) Aplications, AI-Related Management, and AI Application Risk." Informatization Policy, 29(2), 3-36.
  80. Lei, D., Li, Y., Hu, M., Wang, M., Yun, V., Ching, E. & Kamal, E. (2023). Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations. arXiv preprint. https://arxiv.org/abs/2310.03951 https://doi.org/10.03951
  81. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., Rocktaschel, T., Riedel, S. & Kiela, D. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems, 33, 9459-9474.
  82. Li, J., Cheng, X., Zhao, W. X., Nie, J. Y. & Wen, J. R. (2023a). HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.11747
  83. Li, Z., Zhang, S., Zhao, H., Yang, Y. & Yang, D. (2023b). BatGPT: A Bidirectional Autoregessive Ta l ke r f ro m G e n e ra t i ve P re-t ra i n e d Transformer. arXiv preprint. https://arxiv.org/abs/2307.00360
  84. Li, K., Patel, O., Vi'egas, F., Pfister, H. & Wattenberg, M. (2023c). Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. arXiv preprint. https://arxiv.org/abs/2306.03341
  85. Li, B., Zhou, B., Wang, F., Fu, X., Roth, D. & Chen, M. (2023d). Deceiving Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?. arXiv preprint. https://arxiv.org/abs/2311.09702
  86. Li, Y., Li, Y., Zhang, M., Su, C., Ren, M., Qiao, X., Zhao, X., Piao, M., Yu, J., Lv, X., Ma, M., Zhao, Y. & Yang, H. (2023e). A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting. arXiv preprint. https://arxiv.org/abs/2309.09552
  87. Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W. X. & Wen, J. R. (2023f). Evaluating Object Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2305.10355 10355
  88. Li, J., Chen, J., Ren, R., Cheng, X., Zhao, W. X., Nie, J. Y. & Wen, J. R. (2024). The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.03205
  89. Lin, S.C., Hilton, J. & Evans, O. (2022). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Annual Meeting of the Association for Computational Linguistics. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 3214-3252.
  90. Liu, W., Li, G., Zhang, K., Du, B., Chen, Q., Hu, X., Xu, H., Chen, J. & Wu, J. (2023a). Mind's Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models. arXiv preprint. https://arxiv.org/abs/2311.09214
  91. Liu, B., Ash, J.T., Goel, S., Krishnamurthy, A. & Zhang, C. (2023b). Exposing Attention Glitches with Flip-Flop Language Modeling. arXiv preprint. https://arxiv.org/abs/2306.00946
  92. Liu, G., Wang, X., Yuan, L., Chen, Y. & Peng, H. (2023c). Prudent Silence or Foolish Babble? Examining Large Language Models' Responses to the Unknown. arXiv preprint. https://arxiv.org/abs/2311.09731
  93. Liu, Y., Wang, K., Shao, W., Luo, P., Qiao, Y., Shou, M. Z., Zhang, K. & You, Y. (2023d). MLLMs- Augmented Visual-Language Representation Learning. arXiv preprint. https://arxiv.org/abs/2311.18765
  94. Lovenia, H., Dai, W., Cahyawijaya, S., Ji, Z. & Fung, P. (2023). Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2310.05338 https://doi.org/10.05338
  95. Luo, J., Xiao, C. & Ma, F. (2023). Zero-Resource Hallucination Prevention for Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.02654
  96. Luo, J., Li, T., Wu, D., Jenkin, M., Liu, S. & Dudek, G. (2024). Hallucination Detection and Hallucination Mitigation: An Investigation. arXiv preprint. https://arxiv.org/abs/2401.08358
  97. Ma, W., Liu, S., Wang, W., Hu, Q., Liu, Y., Zhang, C., Nie, L. & Liu, Y. (2023). ChatGPT: Understanding Code Syntax and Semantics. arXiv preprint. https://arxiv.org/abs/2305.12138
  98. Manakul, P., Liusie, A. & Gales, M.J. (2023). SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. arXiv preprint. https://arxiv.org/abs/2303.08896
  99. Mei, K. & Zhang, Y. (2023). LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation. arXiv preprint. https://arxiv.org/abs/2310.17488.
  100. Meng, K., Bau, D., Andonian, A. & Belinkov, Y. (2022). "Locating and Editing Factual Associations in GPT." Advances in Neural Information Processing Systems, 35, 17359-17372.
  101. Miao, M., Meng, F., Liu, Y., Zhou, X. H. & Zhou, J. (2021). Prevent the Language Model from Being Overconfident in Neural Machine Translation. arXiv preprint. https://arxiv.org/abs/2105.11098 105.11098
  102. Miao, N., Teh, Y.W. & Rainforth, T. (2023). SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning. arXiv preprint. https://arxiv.org/abs/2308.00436
  103. Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W., Koh, P., Iyyer, M., Zettlemoyer, L. & Hajishirzi, H. (2023). FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. arXiv preprint. https://arxiv.org/abs/2305.14251
  104. Mitchell, E., Lin, C., Bosselut, A., Finn, C. & Manning, C.D. (2022, April 25-29). Fast Model Editing at Scale [Poster Presentation]. The Tenth International Conference on Learning Representations, ICLR 2022 Virtual Event. https://openreview.net/forum?id=0DcZxeWfOPt
  105. Mohammadshahi, A., Vamvas, J. & Sennrich, R. (2023). Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models. arXiv preprint. https://arxiv.org/abs/2311.07439
  106. Montagnese, M., Leptourgos, P., Fernyhough, C., Waters, F., Laroi, F., Jardri, R., McCarthy-Jones, S., Thomas, N., Dudley, R., Taylor, J.-P., Collerton, D. & Urwyler, P. (2021). "A review of multimodal hallucinations: categorization, assessment, theoretical perspectives, and clinical recommendations." Schizophrenia Bulletin, 47(1), 237-248.
  107. Moses, L. (2024). "OpenAI's Sora's Best Features and Biggest Limitations." Business Insider, April 19.
  108. Muhlgay, D., Ram, O., Magar, I., Levine, Y., Ratner, N., Belinkov, Y., Abend, O., Leyton-Brown, K., Shashua, A. & Shoham, Y. (2023). Generating Benchmarks for Factuality Evaluation of Language Models. arXiv preprint. https://arxiv.org/abs/2307.06908
  109. Nathani, D., Wang, D., Pan, L. & Wang, W. Y. (2023). MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models. arXiv preprint. https://arxiv.org/abs/2310.12426
  110. Oh, D. (2024). "Research Trends for Dehallucination of Natural Language Generation Model." Communications of the Korean Institute of Information Scientists and Engineers, 42(1), 15-20.
  111. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J. & Lowe, R. (2022). "Training Language Models to Follow Instructions with Human Feedback." Advances in neural information processing systems, 35, 27730-27744.
  112. Pan, L., Saxon, M., Xu, W., Nathani, D., Wang, X. & Wang, W. Y. (2023). Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Self-Correction Strategies. arXiv preprint. https://arxiv.org/abs/2308.03188
  113. Park, D. (2024). Media Artificial Intelligence. Seoul: Yulgokbook Publishing Company.
  114. Park, D. (2023a). "Journalism Artificial Intelligence Based on Trustworthy Artificial Intelligence : Toward a Commensurability between Media Trust and Trustworthiness of Artificial Intelligence System." Media & Society, 31(4), 5-47.
  115. Park, D. (2023b). "Topology of Media Bias : Fat-Tailed Distribution as Universal Distribution of Quotation by Analyzing News Source Networks with 16.5 Million Articles." Korean Journal of Journalism & Communication Studies, 67(6), 189-222.
  116. Peng, B., Galley, M., He, P., Cheng, H., Xie, Y., Hu, Y., Huang, Q., Liden, L., Yu, Z., Chen, W. & Gao, J. (2023). Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. arXiv preprint. https://arxiv.org/abs/2302.12813
  117. Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A. & Lewis, M. (2022). Measuring and Narrowing the Compositionality Gap in Language Models. arXiv preprint. https://arxiv.org/abs/2210.03350
  118. Qiu, Z., Liu, W., Xiao, T. Z., Liu, Z., Bhatt, U., Luo, Y., Weller, A. & Scholkopf, B. (2022). Iterative Teaching by Data Hallucination. arXiv preprint. https://arxiv.org/abs/2210.17467
  119. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. & Sutskever, I. (2019). "Language Models are Unsupervised Multitask Learners." https://openai.com/research/better-languagemodels (Retrieved on April 27, 2024).
  120. Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K. & Shoham, Y. (2023). In-Context Retrieval-Augmented Language Models. arXiv preprint. https://arxiv.org/abs/2302.00083
  121. Rawte, V., Chakraborty, S., Pathak, A., Sarkar, A., Tonmoy, S. M. T. I., Chadha, A., Sheth, A. P. & Das, A. (2023a). The Troubling Emergence of Hallucination in Large Language Models - An Extensive Definition, Quantification, and Prescriptive Remediations. arXiv preprint. https://arxiv.org/abs/2310.04988
  122. Rawte, V., Sheth, A. & Das, A. (2023b). A Survey of Hallucination in Large Foundation Models. arXiv preprint. https://arxiv.org/abs/2309.05922
  123. Rehman, T., Mandal, R., Agarwal, A. & Sanyal, D. K. (2023). Hallucination Reduction in Long Input Text Summarization. arXiv preprint. https://arxiv.org/abs/2309.16781
  124. Rejeleene, R., Xu, X. & Talburt, J. (2024). Towards Trustable Language Models: Investigating Information Quality of Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.13086
  125. Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684-10695.
  126. Saha, S., Yu, X. V., Bansal, M., Pasunuru, R. & Celikyilmaz, A. (2022). MURMUR: Modular Multi-Step Reasoning for Semi-Structured Datato-Text Generation. arXiv preprint. https://arxiv.org/abs/2212.08607
  127. Sarkar, D., Bali, R. & Ghosh, T. (2018). Hands-On Transfer Learning with Python: Implement Advanced Deep Learning and Neural Network Models using TensorFlow and Keras. Packt Publishing.
  128. Saunders, W., Yeh, C., Wu, J., Bills, S., Long, O., Ward, J. & Leike, J. (2022). Self-Critiquing Models for Assisting Human Evaluators. arXiv preprint. https://arxiv.org/abs/2206.05802
  129. Schulman, J. (2023). "Reinforcement Learning from Human Fheedback: Progress and Challenges." https://www.youtube.com/watch?v=hhiLw5Q_UFg. (Retrieved on April 27, 2024).
  130. Shi, W., Han, X., Lewis, M., Tsvetkov, Y., Zettlemoyer, L. & Yih, S. W. T. (2023). Trusting Your Evidence: Hallucinate Less with Context-Aware Decoding. arXiv preprint. https://arxiv.org/abs/2305.14739
  131. Shi, Z., Wang, Z., Fan, H., Yin, Z., Sheng, L., Qiao, Y. & Shao, J. (2023). ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models. arXiv preprint. https://arxiv.org/abs/2311.02692
  132. Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J. L. & Wang, L. (2022). Prompting GPT-3 To Be Reliable. arXiv preprint. https://arxiv.org/abs/2210.09150
  133. Song, M. & Lee, S. (2024). "What Concerns Does ChatGPT Raise for Us?: An Analysis Centered on CTM (Correlated Topic Modeling) of YouTube Video News Comments." Informatization Policy, 31(1), 3-31.
  134. Sun, Z., Shen, Y., Zhou, Q., Zhang, H., Chen, Z., Cox, D.D., Yang, Y. & Gan, C. (2023a). Principle- Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. arXiv preprint. https://arxiv.org/abs/2305.03047
  135. Sun, Q., Yin, Z., Li, X., Wu, Z., Qiu, X. & Kong, L. (2023b). Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration. arXiv preprint. https://arxiv.org/abs/2310.00280
  136. Tam, D., Mascarenhas, A., Zhang, S., Kwan, S., Bansal, M. & Raffel, C. (2022). Evaluating the Factual Consistency of Large Language Models Through News Summarization. arXiv preprint. https://arxiv.org/abs/2211.08412
  137. Tian, K., Mitchell, E., Yao, H., Manning, C. D. & Finn, C. (2023). Fine-tuning Language Models for Factuality. arXiv preprint. https://arxiv.org/abs/2311.08401
  138. Tonmoy, S. M. T. I., Zaman, S. M. M., Jain, V., Rani, A., Rawte, V., Chadha, A. & Das, A. (2024). A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models. arXiv preprint. https://arxiv.org/abs/2401.01313
  139. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Roziere, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E. & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv preprint. https://arxiv.org/abs/2302.13971
  140. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). "Attention is All You Need." Advances in neural information processing systems, 30.
  141. Verma, S., Goel, T., Tanveer, M., Ding, W. & Sharma, R. (2024). Machine learning techniques for the Schizophrenia diagnosis: A comprehensive review and future research directions. arXiv preprint. https://arxiv.org/abs/2301.07496
  142. Vu, T., Iyyer, M., Wang, X., Constant, N., Wei, J., Wei, J., Tar, C., Sung, Y., Zhou, D., Le, Q. & Luong, T. (2023). FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation. arXiv preprint. https://arxiv.org/abs/2310.03214
  143. Wan, D., Zhang, S. & Bansal, M. (2023). HistAlign: Improving Context Dependency in Language Generation by Aligning with History. arXiv preprint. https://arxiv.org/abs/2305.04782
  144. Wang, Y., Zhong, W., Li, L., Mi, F., Zeng, X., Huang, W., Shang, L., Jiang, X. & Liu, Q. (2023a). Aligning Large Language Models with Human: A Survey. arXiv preprint. https://arxiv.org/abs/2307.12966
  145. Wang, Z., Mao, S., Wu, W., Ge, T., Wei, F. & Ji, H. (2023b). Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration. arXiv preprint. https://arxiv.org/abs/2307.05300
  146. Wang, J., Wang, Y., Xu, G., Zhang, J., Gu, Y., Jia, H., Yan, M., Zhang, J. & Sang, J. (2023c). An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation. arXiv preprint. https://arxiv.org/abs/2311.07397
  147. Wang, B., Wu, F., Han, X., Peng, J., Zhong, H., Zhang, P., Dong, X., Li, W., Li, W., Wang, J. & He, C. (2023d). VIGC: Visual Instruction Generation and Correction. arXiv preprint. https://arxiv.org/abs/2308.12714
  148. Wang, J., Zhou, Y., Xu, G., Shi, P., Zhao, C., Xu, H., Ye, Q., Yan, M., Zhang, J., Zhu, J. & Sang, J. (2023e). Evaluation and Analysis of Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2308.15126
  149. Wang, F. (2024). LightHouse: A Survey of AGI Hallucination. arXiv preprint. https://arxiv.org/abs/2401.06792
  150. Wang, J., Chang, Y., Li, Z., An, N., Ma, Q., Hei, L., Luo, H., Lu, Y. & Ren, F. (2024a). TechGPT-2.0: A large language model project to solve the task of knowledge graph construction. arXiv preprint. https://arxiv.org/abs/2401.04507
  151. Wang, X., Zhou, Y., Liu, X., Lu, H., Xu, Y., He, F., Yoon, J., Lu, T., Bertasius, G., Bansal, M., Yao, H. & Huang, F. (2024b). Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences. arXiv preprint. https://arxiv.org/abs/2401.10529
  152. Wei, J. W., Huang, D., Lu, Y., Zhou, D. & Le, Q. (2023). Simple Synthetic Data Reduces Sycophancy in Large Language Models. arXiv preprint. https://arxiv.org/abs/2308.03958
  153. Wilie, B., Xu, Y., Chung, W., Cahyawijaya, S., Lovenia, H. & Fung, P. (2023). PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems. arXiv preprint. https://arxiv.org/abs/2309.10413
  154. Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J. & Hooi, B. (2023). Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. arXiv preprint. https://arxiv.org/abs/2306.13063
  155. Xu, P., Shao, W., Zhang, K., Gao, P., Liu, S., Lei, M., Meng, F., Huang, S., Qiao, Y. & Luo, P. (2023). LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2306.09265
  156. Xue, T., Wang, Z., Wang, Z., Han, C., Yu, P. & Ji, H. (2023). RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chainof- Thought. arXiv preprint. https://arxiv.org/abs/2305.11499
  157. Yang, Z., Dai, Z., Salakhutdinov, R. & Cohen, W.W. (2018). Breaking the Softmax Bottleneck: A High-Rank RNN Language Model. Paper presented at 6th International Conference on Learning Representations, ICLR 2018, April 30 - May 3, 2018.
  158. Yang, S., Sun, R. & Wan, X. (2023a). A New Benchmark and Reverse Validation Method for Passagelevel Hallucination Detection. arXiv preprint. https://arxiv.org/abs/2310.06498 https://doi.org/10.06498
  159. Yang, L., Zhang, S., Yu, Z., Bao, G., Wang, Y., Wang, J., Xu, R., Ye, W., Xie, X., Chen, W. & Zhang, Y. (2023c). Supervised Knowledge Makes Large Language Models Better In-context Learners. arXiv preprint. https://arxiv.org/abs/2312.15918
  160. Yao, J., Ning, K., Liu, Z., Ning, M. & Yuan, L. (2023a). LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples. arXiv preprint. https://arxiv.org/abs/2310.01469 https://doi.org/10.01469
  161. Yao, Y., Xu, X. & Liu, Y. (2023b). Large Language Model Unlearning. arXiv preprint. https://arxiv.org/abs/2310.10683
  162. Ye, H., Liu, T., Zhang, A., Hua, W. & Jia, W. (2023a). Cognitive Mirage: A Review of Hallucinations in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.06794
  163. Yu, X., Cheng, H., Liu, X., Roth, D. & Gao, J. (2023). Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks. arXiv preprint. https://arxiv.org/abs/2310.12516 https://doi.org/10.12516
  164. Yun, H. S., Marshall, I. J., Trikalinos, T. A. & Wallace, B. C. (2023). Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews. arXiv preprint. https://arxiv.org/abs/2305.11828
  165. Zha, Y., Yang, Y., Li, R. & Hu, Z. (2023). AlignScore: Evaluating Factual Consistency with A Unified Alignment Function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 11328-11348.
  166. Zhai, Y., Tong, S., Li, X., Cai, M., Qu, Q., Lee, Y. J. & Ma, Y. (2023). Investigating the Catastrophic Forgetting in Multimodal Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.10313 10313
  167. Zhang, H., Duckworth, D., Ippolito, D. & Neelakantan, A. (2020). Trading off diversity and quality in natural language generation. arXiv preprint. https://arxiv.org/abs/2004.10450
  168. Zhang, J., Li, Z., Das, K., Malin, B. & Sricharan, K. (2023c). SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semanticaware Cross-check Consistency. arXiv preprint. https://arxiv.org/abs/2311.01740
  169. Zhang, M., Press, O., Merrill, W., Liu, A. & Smith, N. A. (2023b). How Language Model Hallucinations Can Snowball. arXiv preprint. https://arxiv.org/abs/2305.13534
  170. Zhang, S., Pan, L., Zhao, J. & Wang, W.Y. (2023d). The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models. arXiv preprint. https://arxiv.org/abs/2305.13669
  171. Zhang, T., Qiu, L., Guo, Q., Deng, C., Zhang, Y., Zhang, Z., Zhou, C., Wang, X. & Fu, L. (2023e). Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus. arXiv preprint. https://arxiv.org/abs/2311.13230
  172. Zhang, Y., Cui, L., Bi, W. & Shi, S. (2023f). Alleviating Hallucinations of Large Language Models through Induced Hallucinations. arXiv preprint. https://arxiv.org/abs/2312.15710
  173. Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, A.T., Bi, W., Shi, F. & Shi, S. (2023a). Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models. arXiv preprint. https://arxiv.org/abs/2309.01219
  174. Zhao, R., Li, X., Joty, S.R., Qin, C. & Bing, L. (2023a). Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 5823-5840.
  175. Zhao, Z., Wang, B., Ouyang, L., Dong, X., Wang, J. & He, C. (2023b). Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization. arXiv preprint. https://arxiv.org/abs/2311.16839
  176. Zhong, Z., Wu, Z., Manning, C. D., Potts, C. & Chen, D. (2023). Mquake: Assessing Knowledge Editing in Language Models via Multi-Hop Questions . arXiv preprint. https://arxiv.org/abs/2305.14795
  177. Zhou, C., Liu, P., Xu, P., Iyer, S., Sun, J., Mao, Y., Ma, X., Efrat, A., Yu, P., Yu, L., Zhang, S., Ghosh, G., Lewis, M., Zettlemoyer, L. & Levy, O. (2023a). LIMA: Less Is More for Alignment. arXiv preprint. https://arxiv.org/abs/2305.11206
  178. Zhou, Y., Cui, C., Yoon, J., Zhang, L., Deng, Z., Finn, C., Bansal, M. & Yao, H. (2023b). Analyzing and Mitigating Object Hallucination in Large Vision-Language Models. arXiv preprint. https://arxiv.org/abs/2310.00754 https://doi.org/10.00754
  179. Zhu, J., Qi, J., Ding, M., Chen, X., Luo, P., Wang, X., Liu, W., Wang, L. & Wang, J. (2023). Understanding Self-Supervised Pretraining with Part-Aware Representation Learning. arXiv preprint. https://arxiv.org/abs/2301.11915
  180. Zong, M. & Krishnamachari, B. (2022). A Survey on GPT-3. arXiv preprint. https://arxiv.org/abs/2212.00857