DOI QR코드

DOI QR Code

Exploring the Potential of AI Tools in University Writing Assessment: Comparing Evaluation Criteria between Humans and Generative AI

대학 글쓰기 평가에서 인공지능 도구의 활용 가능성 탐색: 인간과 생성형 AI 간 평가 기준 비교

  • So-Young Park (Division of Education, Sookmyung Women's University) ;
  • ByungYoon Lee (Education Research Institute, Sookmyung Women's University)
  • 박소영 (숙명여자대학교 교육학부) ;
  • 이병윤 (숙명여자대학교 교육연구소)
  • Received : 2024.09.19
  • Accepted : 2024.10.15
  • Published : 2024.10.31

Abstract

This study, from the perspective of Learning with AI, aimed to explore the educational applicability of writing evaluation criteria generated by artificial intelligence. Specifically, it sought to systematically analyze the similarities and differences between AI-generated criteria and those developed by humans. The research questions for this study were set as follows: 1) What characteristics do the writing evaluation criteria generated by AI tools have? 2) What similarities and differences exist between the writing evaluation criteria generated by humans and AI tools? GPT and Claude were selected as representative AI tools, and they were tasked with generating writing evaluation criteria for undergraduate students. These AI-generated criteria were then compared with human-created criteria. The results showed a commonality: Both humans and AI-tools placed the highest importance on categories related to content. However, while humans evaluated based on three main categories - content, organization, and language usage - the AI tools included additional categories such as format and citations, original thinking, and overall impression. In general, human tended to include more detailed items within each evaluation category, while AI tools presented more concise items. Notably, differences were observed in language-related aspects and scoring systems, which were influenced by the AI tools being developed based on English. This study offers important insights into the development of collaborative evaluation models between humans and AI, and it explores the potential role of AI as a complementary tool in educational assessment in the future.

본 연구는 Learning with AI 관점에서 출발하여, 인공지능이 생성한 글쓰기 평가 기준의 교육적 활용 가능성을 탐색하고자 하였다. 구체적으로, 인공지능이 생성한 평가 기준과 인간이 개발한 기준 사이의 공통점과 차이점을 체계적으로 분석하고자 하였다. 이를 위한 연구 문제는 1) 인공지능 도구가 생성한 글쓰기 평가 기준은 어떤 특성을 가지는가? 2) 인간과 인공지능 도구가 생성한 글쓰기 평가 기준은 서로 어떠한 공통점과 차이점을 갖는가?로 설정하였다. GPT와 Claude를 대표적인 인공지능 도구로 선정하여 대학생 글쓰기 평가 기준을 생성하게 한 후, 그 결과물을 인간이 만든 글쓰기 평가 기준과 대조하였다. 연구 결과, 인간과 인공지능 도구 모두 글의 내용과 관련한 평가 범주에 가장 높은 중요도를 부여한다는 공통점을 보였다. 그러나, 인간은 내용, 조직, 어법 등 세 개의 주요 범주로 평가하였으나, 인공지능 도구들은 형식 및 인용, 독창적(비판적) 사고, 전체적 인상 등의 추가 범주를 포함하여 평가 기준을 제시하였다. 전반적으로 인간은 각 평가 범주 내에서 상세한 항목을 포함하는 반면, 인공지능 도구들은 간결하게 항목을 설정하였다. 특히, 인공지능 도구가 영어를 기반으로 개발되었기 때문에 발생하는 언어적 차이점과 각 항목별 배점 체계와 관련한 차이점이 발견되었다. 이를 통해, 인간과 인공지능의 협력적 평가 모델 개발에 대한 중요한 시사점을 제시하였으며, 향후 교육평가 장면에서 인공지능의 보완적 도구로서의 역할을 탐색하였다.

Keywords

Acknowledgement

이 논문은 2020년 대한민국 교육부와 한국연구재단의 지원을 받아 수행된 연구임(NRF-2020S1A3A2A02095447).

References

  1. W. Holmes, M. Bialik, and C. Fadel, "Artificial intelligence in education: Promise and implications for teaching and learning," Boston: Center for Curriculum Redesign, 2019.
  2. O. Zawacki-Richter, V. I. Marin, M. Bond, and F. Gouverneur, "Systematic review of research on artificial intelligence applications in higher education - where are the educators?" International Journal of Educational Technology in Higher Education, vol. 16, no. 1, pp. 39-66, October, 2019.
  3. S. Hong, B. Cho, I. Choi, K. Park, H. Kim, Y. Park, and J. Park, "Artificial intelligence and edutech in school education," Jinchoen: Chungbuk, Korea Institute for Curriculum and Evaluation, 2020.
  4. S. Park, B. Lee, Y. Lee, E. Ham, and S. Lee, "Exploring the possibility of science-inquiry competence assessment by ChatGPT-4: Comparisons with human evaluators," Korean Journal of Educational Research, vol. 61, no. 4, pp. 299-332, June, 2023.
  5. S. Park, Y. Hong, and B. Lee, "A study on exploring the potential of ChatGPT in writing skills assessment : Focusing on essay writing," Korean Journal of Educational Research, vol. 62, no. 5, pp. 219-248,
  6. J. W. Sung and B. C. Shin, "Exploring the feasibility of automatic scoring of written test using ChatGPT: Focusing on the world geography written test," Journal of the Association of Korean Geographers, vol. 12, no. 3, pp. 415-432, September, 2023.
  7. W. J. Yoon, Y. H. Koo, and H. Jang, "A study on the performance of ChatGPT in evaluating coverletters and its potential to robotic process automation," Journal of Organization and Management, vol. 47, no. 4, pp. 27-51, November, 2023.
  8. E. H. Ham, S. Park, B. Lee, S. Lee, Y. Lee, and Y. Hong, "Characteristics of GPT-4 automated scoring of scientific inquiry competency," The Journal of Educational Information and Media, vol. 30, no. 3, pp. 713-742, June, 2024.
  9. R. H. Nehm, M. Ha, and E. Mayfield, "Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations," Journal of Science Education and Technology, vol. 21, no. 1, pp. 183-196, April, 2011.
  10. S. Oh, J. Yoon, Y. Chung, Y, Cho, H. Shim, and O. N. Kwon, "Analysis of generative AI's mathematical problem-solving performance: Focusing on ChatGPT 4, Claude 3 Opus, and Gemini Advanced," The Mathematical Education, vol. 63, no. 3, pp. 549-571,
  11. H. Kim and S. Sul, "Service design-focused comparative analysis of intelligent expert assistant (IEA) and task-oriented rule-based chatbot," Journal of Digital Contents Society, vol. 25, no. 6, pp. 1443-1452, June,
  12. J. S. Kim, "The current status and future direction of communication competency subject according to changes in university education," Journal of Humanities, vol. 35, pp. 185-217, July, 2023.
  13. S. Y. Lee, and J. H. Kim, "A study of the instructors' perceptions on university writing and key competencies," Research on Writing, vol. 20, pp. 135-163, March, 2014.
  14. S. Kim, "A study on automated essay evaluation method for argumentative writing task using deep learning based natural language processing technique - Based on model of collaborative scoring between teacher scorer and machine scorer," Ph. D. dissertation, Korean National University of Education, Chungbuk, 2022.
  15. H. Y. Park, S. S. Kim, K. H. Kim, M. J. Lee, K. G. Kim, and J. Y. Kim, "Substanitializing methods of restricted and extended response essay assessment through enforcing the instruction-assessment alignment," Jincheon, Chungbuk: Korea Institute for Curriculum and Evaluation, December, 2019.
  16. Y. S. Lee, S. G. Gu, and M. B. Lee, "Currents and suggestions of scoring reliability and validity," Seoul: Korea Institute for Curriculum and Evaluation, 2013.
  17. Y. M. Park, "The method and procedure of evaluating writing ability," The Education of Korean Language, vol. 99, pp. 1-29, June, 1999.
  18. B. G. Min, K. Y. Nam, S. H. Kim, S. M. Jang, S. J. Lee, and E. S. Kwon, "Development of a national writing skills assessment system in 2023," Seoul: National Institute of Korean Language, 2023.
  19. B. G. Min, Y. Oh, S. Lee, S. Ahn, K. Kim, M. Kim, J. Son, J. Lee, and S. Chang, "Development of a writing ability diagnosis system for university first-year students - The Seoul National University Essay Examination," The Korean Journal of Literacy Research, vol. 13, no. 6, pp. 45-76, December, 2022.
  20. S. Park, B. Lee, and Y. Hong, "An exploratory study on developing the AI essay test tool based on ChatGPT: Focusing on the interaction with the engineer," Journal of Practical Engineering Education, vol. 16, no. 1, pp. 21-31, February,
  21. C. K. Lo, "What is the impact of ChatGPT on education? A rapid review of the literature," Education Sciences, vol. 13, no. 4, pp. 410-425, April, 2023.
  22. D. Lee, H. Shim, and J. Baek, "Exploration on the feasibility of utilization and teacher perceptions of using ChatGPT for student assessment in Science," Journal of the Korean Association for Science Education, vol. 44, no. 1, pp. 119-130, February,
  23. Chsosun Biz, "Human-level comprehension" Claude 3 unveiled: ChatGPT and Gemini on edge... Anthropic shaking up the generative AI landscape," March 12, 2024. [Online]. Available: https://biz.chosun.com/it-science/ict/2024/03/12/XDRFR5TOYVFWRCGJKGIBCCYECQ/
  24. News2day, "Generative AI, Encroaching on human creativity! (62) How far ahead is Anthropic's Claude compared to ChatGPT-4? (part 2)," July 16, 2024 [Online]. Available: https://www.news2day.co.kr/article/ 20240715500210
  25. News2day, "Generative AI, Encroaching on human creativity! (61) How far ahead is Anthropic's Claude compared to ChatGPT-4? (part 1)," July 9, 2024 [Online]. Available: https://www.news2day.co.kr/article/ 20240708500134
  26. S. Lee, J. Kim, and H. Cheon, "A study on the evaluation of expository writing skills of middle school students," Korean Literature & Language Education, vol. 40, pp. 71-99, August, 2022.