DOI QR코드

DOI QR Code

Analysis of generative AI's mathematical problem-solving performance: Focusing on ChatGPT 4, Claude 3 Opus, and Gemini Advanced

생성형 인공지능의 수학 문제 풀이에 대한 성능 분석: ChatGPT 4, Claude 3 Opus, Gemini Advanced를 중심으로

  • Received : 2024.07.31
  • Accepted : 2024.08.20
  • Published : 2024.08.31

Abstract

As digital·AI-based teaching and learning is emphasized, discussions on the educational use of generative AI are becoming more active. This study analyzed the mathematical performance of ChatGPT 4, Claude 3 Opus, and Gemini Advanced on solving examples and problems from five first-year high school math textbooks. As a result of examining the overall correct answer rate and characteristics of each skill for a total of 1,317 questions, ChatGPT 4 had the highest overall correct answer rate of 0.85, followed by Claude 3 Opus at 0.67, and Gemini Advanced at 0.42. By skills, all three models showed high correct answer rates in 'Find functions' and 'Prove', while relatively low correct answer rates in 'Explain' and 'Draw graphs'. In particular, in 'Count', ChatGPT 4 and Claude 3 Opus had a correct answer rate of 1.00, while Gemini Advanced was low at 0.56. Additionally, all models had difficulty in explaining using Venn diagrams and creating images. Based on the research results, teachers should identify the strengths and limitations of each AI model and use them appropriately in class. This study is significant in that it suggested the possibility of use in actual classes by analyzing the mathematical performance of generative AI. It also provided important implications for redefining the role of teachers in mathematics education in the era of artificial intelligence. Further research is needed to develop a cooperative educational model between generative AI and teachers and to study individualized learning plans using AI.

디지털·AI 기반 교수·학습이 강조됨에 따라 생성형 AI의 교육적 활용에 대한 논의가 활발해지고 있다. 본 연구는 고등학교 1학년 수학 교과서 5종의 예제와 문제 풀이에 대한 ChatGPT 4, Claude 3 Opus, Gemini Advanced의 수학적 성능을 분석하였다. 총 1,317개 문항에 대해 전체 정답률과 기능별 특징을 살펴본 결과, ChatGPT 4의 전체 정답률이 0.85로 가장 높았고, Claude 3 Opus가 0.67, Gemini Advanced가 0.42 순으로 나타났다. 기능별로는 함수 구하기와 증명하기에서 세 모델 모두 높은 정답률을 보였으나, 설명하기와 그래프 그리기에서는 상대적으로 낮은 정답률을 보였다. 특히 경우의 수 세기에서 ChatGPT 4와 Claude 3 Opus가 1.00의 정답률을 보인 반면, Gemini Advanced는 0.56으로 낮았다. 또한 모든 모델이 벤 다이어그램을 이용한 설명하기와 이미지 생성이 필요한 문제에서 어려움을 겪었다. 연구 결과를 바탕으로 교사들은 각 AI 모델의 강점과 한계를 파악하고 이를 수업에 적절히 활용할 수 있을 것이다. 본 연구는 생성형 AI의 수학적 성능을 분석함으로써, 실제 수학 수업에서의 생성형 AI의 활용 가능성을 제시했다는 점에서 의의가 있다. 또한 인공지능시대의 수학 교육에서 교사의 역할을 재정립하는 데 중요한 시사점을 제공하였다. 향후 생성형 AI와 교사의 협력적 교육 모델 개발, AI를 활용한 개별화 학습 방안 연구 등이 필요할 것이다.

Keywords

References

  1. Ahn, D., Son, T., & Lee, K. (2023). ChatGPT as a scaffolding tool: Evaluating the impact on elementary students' mathematical logic problem-solving skills. Brain, Digital, & Learning, 13(2), 183-196. https://doi.org/10.31216/BDL.20230011
  2. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P.,... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165
  3. Christudas, B. C. L., Kirubakaran, E., & Thangaiah, P. R. J. (2018). An evolutionary approach for personalization of content delivery in e-learning systems based on learner behavior forcing compatibility of learning materials. Telematics and Informatics, 35 (3), 520-533. https://doi.org/10.1016/j.tele.2017.02.004
  4. CTOL Editors. (2024, April 16). Latest LLM market share Mar 2024: ChatGPT leads, Gemini surges, and Claude triples. CTOL Digital Solutions. https://www.ctol.digital/news/latest-llm-market-share-mar-2024-chatgpt-leads-geminisurges-and-claude-triples/
  5. Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E.,... & Strang, G. (2022). A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences, 119 (32), e2123433119. https://doi.org/10.48550/arXiv.2112.15594
  6. Frieder, S., Pinchetti, L., Griffiths, R. R., Salvatori, T., Lukasiewicz, T., Petersen, P. C.,... & Berner, J. (2023). Mathematical capabilities of ChatGPT. arXiv preprint arXiv:2301.13867. https://doi.org/10.48550/arXiv.2301.13867
  7. Galatolo, F. A., Cimino, M. G., & Vaglini, G. (2022). Zero-shot mathematical problem solving via generative pre-trained transformers. Proceedings of the 24th International Conference on Enterprise Information Systems (ICEIS 2022), 1, 479-483.
  8. Go, B., Lim, C., & Shin, B. C. (2024). Development of a math-AI convergence instructional model using a generative AI chatbot. Journal of Educational Technology, 40 (1), 1-40. http://dx.doi.org/10.17232/KSET.40.1.1
  9. Govender, R. (2023). The impact of artificial intelligence and the future of ChatGPT for mathematics teaching and learning in schools and higher education. Pythagoras, 44 (1), 1-2. https://pythagoras.org.za/index.php/pythagoras/article/view/787
  10. Hong, S., Lee, J., Shin, T., Lee, C., Lee, B., Shin, Y., ... & Kang, I. (2018). High school mathematics. Jihaksa.
  11. Kang, Y. (2024). A study on the didactical application of ChatGPT for mathematical word problem solving. Communications of Mathematical Education, 38 (1), 49-67. https://doi.org/10.7468/jksmee.2024.38.1.49
  12. Kim, J. (2023). Leading teachers' perspective on teacher-AI collaboration in education. Education and Information Technologies, 1-32. https://doi.org/10.1007/s10639-023-12109-5
  13. Kim, W., Cho, M., Bang, G., Yoon, J., Shin, J., Im, S., ... & Jung, J. (2018). High school mathematics. Visang.
  14. Ko, S., Lee, J., Lee, S., Cha, S., Kim, Y., Oh, T., & Cho, S. (2018). High school mathematics. Sinsago.
  15. Kwon, O. N., Oh, S. J., Yoon, J., Lee, K., Shin, B. C., & Jung, W. (2023). Analyzing mathematical performances of ChatGPT: Focusing on the solution of national assessment of educational achievement and the college scholastic ability test. Communications of Mathematical Education, 37 (2), 233-256. http://doi.org/10.7468/jksmee.2023.37.2.233
  16. Lee, J., Choi, B., Kim, D., Lee, J., Jeon, C., Chang, H., ... & Kim, M. (2018). High school mathematics. Chunjae.
  17. Lee, S.G., Park, D., Lee, J.Y., Lim, D.S., Lee, J. H. (2024). Use of ChatGPT in college mathematics education. The Mathematical Education, 63 (2), 123-138. https://doi.org/10.7468/MATHEDU.2024.63.2.123
  18. Lee, Y. (2023). An analysis of pre-service teachers' mathematics lesson design using ChatGPT. Communications of Mathematical Education, 37 (3), 497-516. https://doi.org/10.7468/jksmee.2023.37.3.497
  19. Lv, H. Z. (2023). Innovative music education: Using an AI-based flipped classroom. Education and Information Technologies, 28 (11), 15301-15316. https://doi.org/10.1007/s10639-023-11835-0
  20. Mhlanga, D. (2023). Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. In FinTech and artificial intelligence for sustainable development: The role of smart technologies in achieving development goals (pp. 387-409). Springer. https://doi.org/10.1007/s10639-023-11835-0
  21. Ministry of Education. (2015). Mathematics Curriculum. Bulletin of MOE No. 2015-74 [Separate Volume #8]. Ministry of Education.
  22. Ministry of Education. (2022). Mathematics curriculum. Bulletin of MOE No.2022-33 [Separate Volume #8]. Ministry of Education.
  23. Ministry of Education. (2024). Support plan for strengthening education innovation capacity based on digital technology. Ministry of Education. https://www.moe.go.kr/boardCnts/viewRenew.do?boardID=72769&boardSeq=94551&lev=0&searchType=null&statusYN=W&page=1&s=moe&m=0315&opType=N
  24. Oh, S. J. (2023). Effective ChatGPT prompts in mathematical problem solving: Focusing on quadratic equations and quadratic functions. Communications of Mathematical Education, 37 (3), 545-467. https://doi.org/10.7468/jksmee.2023.37.3.545
  25. Oh, S. J. (2024). An analysis of the use of technology tools in high school mathematics textbooks based. Communications of Mathematical Education, 38 (2), 263-286. https://doi.org/10.7468/JKSMEE.2024.38.2.263
  26. Park, J., Song, M. Y., Nam, M., & Choi, K. (2021). Applying an integrated approach to analyse item features for curriculumbased assessments. Journal of Curriculum Evaluation, 24 (1), 101-122. https://doi.org/10.29221/jce.2021.24.1.101
  27. Parra, V., Sureda, P., Corica, A., Schiaffino, S., & Godoy, D. (2024). Can generative AI solve geometry problems? Strengths and weaknesses of LLMs for geometric reasoning in Spanish. International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 8 (5), 65-74. https://dx.doi.org/10.9781/ijimai.2024.02.009
  28. Ryu, H., Sunwoo, H., Shin, B., Cho, J., Lee, B., Kim, Y., ... & Jung, S. (2018). High school mathematics. Chunjae.
  29. Son, T. (2023). Exploring the possibility of using ChatGPT in mathematics education: Focusing on student product and pre-service teachers' discourse related to fraction problems. Education of primary school mathematics, 26 (2), 99-113. https://doi.org/10.7468/jksmec.2023.26.2.99
  30. Supriyadi, E., & Kuncoro, K. S. (2023). Exploring the future of mathematics teaching: Insight with ChatGPT. Union: Jurnal Ilmiah Pendidikan Matematika, 11 (2), 305-316. http://dx.doi.org/10.30738/union.v11i2.14898
  31. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://doi.org/10.48550/arXiv.1706.03762
  32. Vuorikari, R., Jerzak, N., Karpinski, Z., Pokropek, A., & Tudek, J. (2022). Measuring digital skills across the EU: Digital Skills Indicator 2.0. Publication office of the European Union. https://data.europa.eu/doi/10.2760/897803
  33. Wardat, Y., Tashtoush, M. A., AlAli, R., & Jarrah, A. M. (2023). ChatGPT: A revolutionary tool for teaching and learning mathematics. Eurasia Journal of Mathematics, Science and Technology Education, 19 (7), em2286. https://doi.org/10.29333/ejmste/13272
  34. Wong, T. K. Y., Tao, X., & Konishi, C. (2018). Teacher support in learning: Instrumental and appraisal support in relation to math achievement. Issues in Educational Research, 28 (1), 202-219. http://www.iier.org.au/iier28/wong.pdf
  35. Yoon, J., Park, S., & Kwon, O. N. (2023). ChatGPT-flipped mathematics class case study: Focused on learners' engagement. Journal of Educational Technology, 39 (4), 1011-1047. http://dx.doi.org/10.17232/KSET.39.4.1011
  36. Zafrullah, Z., Hakim, M. L., & Angga, M. (2023). ChatGPT open AI: Analysis of mathematics education students learning interest. Journal of Technology Global, 1 (01), 1-10. https://doi.org/10.59613/global.v1i1.1
  37. Zhang, Z., Amiri, H., Liu, Z., Zufle, A., & Zhao, L. (2023). Large language models for spatial trajectory patterns mining. arXiv preprint arXiv, 2310.04942. https://doi.org/10.48550/arXiv.2310.04942
  38. Zhu, J. J., Jiang, J., Yang, M., & Ren, Z. J. (2023). ChatGPT and environmental research. Environmental Science & Technology, 57(46), 17667-17670. https://pubs.acs.org/doi/10.1021/acs.est.3c01818?goto=supporting-info
  39. Zhu, Y., & Yang, F. (2023). ChatGPT/aigc and educational innovation: Opportunities, challenges, and the future. Journal of East China Normal University (Educational Sciences), 41 (7), 1. https://doi.org/10.16382/j.cnki.1000-5560.2023.07.001