DOI QR코드

DOI QR Code

Development of Block-based Code Generation and Recommendation Model Using Natural Language Processing Model

자연어 처리 모델을 활용한 블록 코드 생성 및 추천 모델 개발

  • Jeon, In-seong (Dept. of Computer Education, Korea National University of Education) ;
  • Song, Ki-Sang (Dept. of Computer Education, Korea National University of Education)
  • 전인성 (한국교원대학교 컴퓨터교육과) ;
  • 송기상 (한국교원대학교 컴퓨터교육과)
  • Received : 2022.04.25
  • Accepted : 2022.05.25
  • Published : 2022.06.30

Abstract

In this paper, we develop a machine learning based block code generation and recommendation model for the purpose of reducing cognitive load of learners during coding education that learns the learner's block that has been made in the block programming environment using natural processing model and fine-tuning and then generates and recommends the selectable blocks for the next step. To develop the model, the training dataset was produced by pre-processing 50 block codes that were on the popular block programming language web site 'Entry'. Also, after dividing the pre-processed blocks into training dataset, verification dataset and test dataset, we developed a model that generates block codes based on LSTM, Seq2Seq, and GPT-2 model. In the results of the performance evaluation of the developed model, GPT-2 showed a higher performance than the LSTM and Seq2Seq model in the BLEU and ROUGE scores which measure sentence similarity. The data results generated through the GPT-2 model, show that the performance was relatively similar in the BLEU and ROUGE scores except for the case where the number of blocks was 1 or 17.

본 논문에서는 코딩 학습 중 학습자의 인지 부하 감소를 목적으로 자연어 처리 모델을 이용하여 전이학습 및 미세조정을 통해 블록 프로그래밍 환경에서 이미 이루어진 학습자의 블록을 학습하여 학습자에게 다음 단계에서 선택 가능한 블록을 생성하고 추천해 주는 머신러닝 기반 블록 코드 생성 및 추천 모델을 개발하였다. 모델 개발을 위해 훈련용 데이터셋은 블록 프로그래밍 언어인 '엔트리' 사이트의 인기 프로젝트 50개의 블록 코드를 전처리하여 제작하였으며, 훈련 데이터셋과 검증 데이터셋 및 테스트 데이터셋으로 나누어 LSTM, Seq2Seq, GPT-2 모델을 기반으로 블록 코드를 생성하는 모델을 개발하였다. 개발된 모델의 성능 평가 결과, GPT-2가 LSTM과 Seq2Seq 모델보다 문장의 유사도를 측정하는 BLEU와 ROUGE 지표에서 더 높은 성능을 보였다. GPT-2 모델을 통해 실제 생성된 데이터를 확인한 결과 블록의 개수가 1개 또는 17개인 경우를 제외하면 BLEU와 ROUGE 점수에서 비교적 유사한 성능을 내는 것을 알 수 있었다.

Keywords

Acknowledgement

이 논문은 2022년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No. 2021R1I1A3052234)

References

  1. Ala-Mutka, K. M. (2005). A survey of automated assessment approaches for programming assignments. Computer science education, 15(2), 83-102. https://doi.org/10.1080/08993400500150747
  2. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  3. Chow, K., Grabke, E. P., Lee, J., Yoo, J., Musselman, K. E., & Masani, K. (2017). Development of visual feedback training using functional electrical stimulation therapy for balance rehabilitation. STEM Fellowship Journal, 3(2), 1-2.
  4. Crow, T., Luxton-Reilly, A., & Wuensche, B. (2018). Intelligent tutoring systems for programming education: a systematic review. In Proceedings of the 20th Australasian Computing Education Conference, 53-62.
  5. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  6. Edwards, S. H., & Perez-Quinones, M. A. (2008). Web-CAT: automatically grading programming assignments. In Proceedings of the 13th annual conference on Innovation and technology in computer science education, 328-328.
  7. Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., et al. (2021). GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv 2021, arXiv:2009.08366v3.
  8. Guo, T.; Gao, H. Content Enhanced BERT-based Text-to-SQL Generation. arXiv 2020, arXiv:1910.07179v5.
  9. Harvey, B., & Monig, J. (2010). Bringing "no ceiling" to scratch: Can one language serve kids and computer scientists. Proc. Constructionism, 1-10.
  10. Ifenthaler, D. (2017). Are higher education institutions prepared for learning analytics? TechTrends, 61(4), 366-371. https://doi.org/10.1007/s11528-016-0154-0
  11. Jeon, I. S., & Song, K. S. (2019). The Effect of learning analytics system towards learner's computational thinking capabilities. In Proceedings of the 2019 11th International Conference on Computer and Automation Engineering, 12-16.
  12. Keuning, H., Heeren, B., & Jeuring, J. (2021). A tutoring system to learn code refactoring. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, 562-568.
  13. Kim, S. H. (2015). Analysis of Non-Computer Majors' Difficulties in Computational Thinking Education. The Journal of Korean association of computer education, 18(3), 49-57. https://doi.org/10.32431/KACE.2015.18.3.005
  14. Koyya, P., Lee, Y., & Yang, J. (2013). Feedback for programming assignments using software-metrics and reference code. International Scholarly Research Notices.
  15. Lamb, A., & Johnson, L. (2011). Scratch: computer programming for 21st century learners, Teacher Librarian, 38, 64-68.
  16. Le, N. T., Strickroth, S., Gross, S., & Pinkwart, N. (2013). A review of AI-supported tutoring approaches for learning programming. Advanced Computational Methods for Knowledge Engineering, 267-279.
  17. Lee, J. Y., Kim, J. M., & Lee, W. G. (2019). A Study on Partial Scoring in Text Based Program Evaluation. The Journal of Korean Association of Computer Education, 22(2), 29-38. https://doi.org/10.32431/KACE.2019.22.2.004
  18. Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74-81.
  19. Lister, R. (2011). Computing education research programming, syntax and cognitive load. ACM Inroads, 2(2), 21-22. https://doi.org/10.1145/1963533.1963539
  20. Maloney, J. H., Peppler, K., Kafai, Y., Resnick, M., & Rusk, N. (2008). Programming by choice: urban youth learning programming with scratch. In Proceedings of the 39th SIGCSE technical symposium on Computer science education, 367-371.
  21. Maloney, J., Resnick, M., Rusk, N., Silverman, B., & Eastmond, E. (2010). The scratch programming language and environment. ACM Transactions on Computing Education, 10(4), 1-15.
  22. Ministry of Education. (2015). Practical (technical/ family) and Information curriculum. (Separate Book 10), Sejong: Ministry of Education, Science and Technology.
  23. Moreno-Leon, J., Robles, G., & Roman-Gonzalez, M. (2015). Dr. Scratch: analisis automatico de proyectos Scratch para evaluar y fomentar el Pensamiento Computacional. Revista de Educacion a Distancia(RED), 46.
  24. Krogstie, J., Opdahl, A.L. and Brinkkemper, S. (2007). Conceptual Modeling in Information Systems Engineering, Springer.
  25. Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311-318.
  26. Price, T. W., & Barnes, T. (2017). Position paper: Block-based programming should offer intelligent support for learners. In 2017 IEEE Blocks and Beyond Workshop (B&B), 65-68.
  27. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
  28. Seo, J. H., & Kim, Y. S. (2016). Development and Application of Teaching-Learning Strategy for PBL-based Programming Education Using Reflection Journal in Elementary School. Journal of The Korean Association of Information Education, 20(5), 465-474. https://doi.org/10.14352/jkaie.20.4.465
  29. Song, J. H., Lee, J. Y., Seo, Y. H., & Kim, H. S. (2021). A Study on the Development Policy of AI and SW Talent in the Fourth Industrial Revolution. Reserch Report RE-101, SPRI.
  30. Spacco, J., & Pugh, W. (2006). Helping students appreciate test-driven development (TDD). In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications, 907-913.
  31. Sun, Z., Zhu, Q., Xiong, Y., Sun, Y., Mou, L., Zhang, L. (2019). TreeGen: A Tree-Based Transformer Architecture for Code Generation. arXiv 2019, arXiv:1911.09983v2.
  32. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems, 3104-3112.
  33. Trower, J., & Gray, J. (2015). Blockly language creation and applications: Visual programming for media computation and bluetooth robotics control. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education, 5-5.
  34. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, 5998-6008.
  35. Von Wangenheim, C. G., Hauck, J. C., Demetrio, M. F., Pelle, R., da Cruz Alves, N., Barbosa, H., & Azevedo, L. F. (2018). CodeMaster--Automatic Assessment and Grading of App Inventor and Snap! Programs. Informatics in Education, 17(1), 117-150. https://doi.org/10.15388/infedu.2018.08
  36. Vujosevic-Janicic, M., Nikolic, M., Tosic, D., & Kuncak, V. (2013). Software verification and graph similarity for automated evaluation of students' assignments. Information and Software Technology, 55(6), 1004-1016. https://doi.org/10.1016/j.infsof.2012.12.005