DOI QR코드

DOI QR Code

Performance of ChatGPT on the Korean National Examination for Dental Hygienists

  • Soo-Myoung Bae (Department of Dental Hygiene, College of Dentistry, Gangneung-Wonju National University) ;
  • Hye-Rim Jeon (Department of Dental Hygiene, College of Dentistry, Gangneung-Wonju National University) ;
  • Gyoung-Nam Kim (Department of Dental Hygiene, College of Dentistry, Gangneung-Wonju National University) ;
  • Seon-Hui Kwak (Department of Dental Hygiene, College of Dentistry, Gangneung-Wonju National University) ;
  • Hyo-Jin Lee (Department of Dental Hygiene, College of Dentistry, Gangneung-Wonju National University)
  • 투고 : 2024.02.29
  • 심사 : 2024.03.19
  • 발행 : 2024.03.31

초록

Background: This study aimed to evaluate ChatGPT's performance accuracy in responding to questions from the national dental hygienist examination. Moreover, through an analysis of ChatGPT's incorrect responses, this research intended to pinpoint the predominant types of errors. Methods: To evaluate ChatGPT-3.5's performance according to the type of national examination questions, the researchers classified 200 questions of the 49th National Dental Hygienist Examination into recall, interpretation, and solving type questions. The researchers strategically modified the questions to counteract potential misunderstandings from implied meanings or technical terminology in Korea. To assess ChatGPT-3.5's problem-solving capabilities in applying previously acquired knowledge, the questions were first converted to subjective type. If ChatGPT-3.5 generated an incorrect response, an original multiple-choice framework was provided again. Two hundred questions were input into ChatGPT-3.5 and the generated responses were analyzed. After using ChatGPT, the accuracy of each response was evaluated by researchers according to the types of questions, and the types of incorrect responses were categorized (logical, information, and statistical errors). Finally, hallucination was evaluated when ChatGPT provided misleading information by answering something that was not true as if it were true. Results: ChatGPT's responses to the national examination were 45.5% accurate. Accuracy by question type was 60.3% for recall and 13.0% for problem-solving type questions. The accuracy rate for the subjective solving questions was 13.0%, while the accuracy for the objective questions increased to 43.5%. The most common types of incorrect responses were logical errors 65.1% of all. Of the total 102 incorrectly answered questions, 100 were categorized as hallucinations. Conclusion: ChatGPT-3.5 was found to be limited in its ability to provide evidence-based correct responses to the Korean national dental hygiene examination. Therefore, dental hygienists in the education or clinical fields should be careful to use artificial intelligence-generated materials with a critical view.

키워드

참고문헌

  1. Kung TH, Cheatham M, Medenilla A, et al.: Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2: e0000198, 2023. https://doi.org/10.1371/journal.pdig.0000198
  2. Gilson A, Safranek CW, Huang T, et al.: How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ 9: e45312, 2023. https://doi.org/10.2196/45312 Erratum in: JMIR Med Educ 10: e57594, 2024. https://doi.org/10.2196/57594
  3. Brown TB, Mann B, Ryder N, et al.: Language models are few-shot learners. Paper presented at: 34th International Conference on Neural Information Processing Systems; 2020 Dec 6-12; Vancouver, Canada. Red Hook: Curran Associates Inc., 2020.
  4. JAGRAN Josh: What is ChatGPT: definition, how to use it. Retrieved January 31, 2024, from: https://www.jagranjosh.com/general-knowledge/what-is-chat-gpt-1676870553-1 (2023, February 20).
  5. Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L: Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 329: 842-844, 2023. https://doi.org/10.1001/jama.2023.1044
  6. Savery M, Abacha AB, Gayen S, Demner-Fushman D: Question-driven summarization of answers to consumer health questions. Sci Data 7: 322, 2020. https://doi.org/10.1038/s41597-020-00667-z
  7. Das A, Selek S, Warner AR, et al.: Conversational bots for psychotherapy: a study of generative transformer models using domain-specific dialogues. In: Demner-Fushman D, Cohen KB, Ananiadou S, Tsujii J, eds. Proceedings of the 21st Workshop on Biomedical Language Processing. Association for Computational Linguistics, Dublin, pp.285-297, 2022. https://doi.org/10.18653/v1/2022.bionlp-1.27
  8. Wojcik S, Rulkiewicz A, Pruszczyk P, Lisik W, Pobozy M, Domienik-Karlowicz J: Reshaping medical education: performance of ChatGPT on a PES medical examination. Cardiol J 2023. doi: 10.5603/cj.97517. [Epub ahead of print]
  9. Huh S: Are ChatGPT's knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination?: a descriptive study. J Educ Eval Health Prof 20: 1, 2023. https://doi.org/10.3352/jeehp.2023.20.1
  10. Shin DK, Jung HK, Lee YS: Exploring the potential of using ChatGPT as a content-based English learning and teaching tool. J Korea Eng Educ Soc 22: 171-192, 2023. https://doi.org/10.18649/jkees.2023.22.1.171
  11. Kim SH, Song KH, Kwun HS, et al.: Dental hygienist job analysis for item development for the Korean dental hygienists' licensing examination. J Educ Eval Health Prof 2: 59-74, 2005. https://doi.org/10.3352/jeehp.2005.2.1.59
  12. Kim SM, Kim JH, Choi MJ, Jeong SH: Evaluation of the applicability of ChatGPT in biological nursing science education. J Korean Biol Nurs Sci 25: 183-204, 2023. https://doi.org/10.7586/jkbns.23.0013
  13. Kim JG: Analysis report of the 49th national dental hygienist examination in 2021. Korea Health Personnel Licensing Examination Institute, Seoul, pp.1-33, 2022.
  14. Kim SW, Suh JY, Park JY, Kim DG: Evaluating Korean language capability of ChatGPT and discussion. Korean Inst Inf Sci Eng Proc Korea Comput Congr 2023: 286-288,
  15. Sun GH, Hoelscher SH: The ChatGPT storm and what faculty can do. Nurse Educ 48: 119-124, 2023. https://doi.org/10.1097/NNE.0000000000001390