DOI QR코드

DOI QR Code

Exploring Method for Enhancing Non-expert Evaluation Accuracy: Using Weighted Functions Based on Common Evaluation Items

비전문가의 평가 정확도 향상 방안 탐색: 공통 평가 항목 점수 기반 가중치 함수를 활용한 점수 보정 방법 연구

  • Min Hae Song (Department of Psychology & Asia Center, Seoul National University) ;
  • Hyunwoo Gu (Department of Brain & Cognitive Sciences, Seoul National University) ;
  • Jungyeon Park (Department of Psychology & Asia Center, Seoul National University) ;
  • Jaeseo Lim (Department of Counselling Psychology, Jeonju University) ;
  • Jooyong Park (Department of Psychology & Asia Center, Seoul National University)
  • 송민해 (서울대학교 심리학과 & 아시아연구소) ;
  • 구현우 (서울대학교 뇌인지과학과) ;
  • 박정연 (서울대학교 심리학과 & 아시아연구소) ;
  • 임재서 (전주대학교 상담심리학과) ;
  • 박주용 (서울대학교 심리학과 & 아시아연구소)
  • Received : 2024.07.04
  • Accepted : 2024.07.20
  • Published : 2024.09.30

Abstract

Evaluation activities are beneficial for learning or training. However, they are not actively used due to concerns about the evaluation accuracy of non-experts. Although there are methods to improve accuracy, there is a limitation that additional procedures or processes are required in addition to evaluation. In this study, we aimed to improve evaluation accuracy of non-expert by using common evaluation items and assigning weights based on differences from expert scores. In Study 1, we conducted a simulation with 50 non-experts evaluating essays. Our findings indicate that when non-experts' evaluation methods are different from those of experts, our proposed method using a single common evaluation item improves assessment accuracy. In Study 2, we analyzed data from experimental situation in which non-expert evaluated each other's essays. Consistent with Study 1, our proposed method effectively improved assessment accuracy when non-experts' evaluation methods differed from those of experts. In the discussion section, we addressed the applicability of the method proposed in this study in real world settings.

평가 활동은 학습이나 훈련에 도움이 되지만, 비전문가의 평가 정확도에 대한 우려로 인해 적극적으로 활용되지 않는다. 평가 정확도를 향상시키기 위한 몇몇 방안들이 있기는 하지만, 평가 외에도 추가적인 절차나 과정이 필요하다는 한계가 있다. 본 연구에서는 소수의 공통 평가 항목을 이용하여, 전문가 점수와의 차이에 따라 가중치를 부여함으로써 평가 정확도를 향상시키는 방안을 탐색하였다. 연구 1에서는 50명의 가상의 비전문가가 글을 평가하는 상황을 가정하고 시뮬레이션을 실시하였다. 그 결과, 비전문가와 전문가의 평가 점수 간 상관 정도에 따라 공통 평가 항목을 이용한 보정 결과가 달라짐을 발견하였다. 상관이 높을 경우 공통 평가 항목을 이용한 보정이 효과가 없었지만, 다를 경우 하나의 공통 평가 항목을 활용한 가중치로 평가 점수를 보정할 때 평가 정확도가 향상되었다. 연구 2에서는 실험 장면에서 주장문을 평가한 실제 자료를 이용하였다. 분석 결과, 연구 1에서와 같은 결과를 얻었다. 논의에서는 본 연구 결과를 실제 평가 장면에 적용할 가능성이 다루어졌다.

Keywords

References

  1. Ade-Ibijola, A. O. (2012). A simulated enhancement of Fisher-Yates algorithm for shuffling in virtual card games using domain-specific data structures. International Journal of Computer Applications, 54(11).
  2. Aishwarya, C., & Beny, J. R. (2015). Novel architecture for data shuffling using fisher yates shuffle algorithm. International journal of scientific research in science, engineering and technology, 1, 387-390.
  3. Alfieri, L., Brooks, P. J., Aldrich, N. J., & Tenenbaum, H. R. (2011). Does discovery-based instruction enhance learning? Journal of educational psychology, 103(1), 1. https://doi.org/10.1037/a0021017
  4. Allodi, L., Cremonini, M., Massacci, F., & Shim, W. (2020). Measuring the accuracy of software vulnerability assessments: experiments with students and professionals. Empirical Software Engineering, 25, 1063-1094. https://doi.org/10.1007/s10664-019-09797-4
  5. Ambrose, M. L., & Cropanzano, R. (2003). A longitudinal analysis of organizational fairness: An examination of reactions to tenure and promotion decisions. Journal of Applied Psychology, 88(2), 266. https://doi.org/10.1037/0021-9010.88.2.266
  6. Balfour, S. P. (2013). Assessing Writing in MOOCs: Automated Essay Scoring and Calibrated Peer Review TM. Research & Practice in Assessment, 8, 40-48.
  7. Bloxham, S., den-Outer, B., Hudson, J., & Price, M. (2016). Let's stop the pretence of consistent marking: exploring the multiple limitations of assessment criteria. Assessment & Evaluation in Higher Education, 41(3), 466-481. https://doi.org/10.1080/02602938.2015.1024607
  8. Brown, S. (2005). Assessment for learning. Learning and teaching in higher education, (1), 81-89.
  9. Cheng, W., & Warren, M. (2000). Making a difference: Using peers to assess individual students' contributions to a group project. Teaching in Higher education, 5(2), 243-255. https://doi.org/10.1080/135625100114885
  10. Cho, K., & MacArthur, C. (2011). Learning by reviewing. Journal of educational psychology, 103(1), 73. https://doi.org/10.1037/a0021950
  11. Dalio, R. (2018). Principles. Simon and Schuster.
  12. Davies, P. (2000). Computerized peer assessment. Innovations in Education and Teaching International, 37(4), 346. https://doi.org/10.1080/135580000750052955
  13. Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in Education and training International, 32(2), 175-187. https://doi.org/10.1080/1355800950320212
  14. Franke, N., Keinz, P., & Klausberger, K. (2013). "Does this sound like a fair deal?": Antecedents and consequences of fairness expectations in the individual's decision to participate in firm innovation. Organization science, 24(5), 1495-1516. https://doi.org/10.1287/orsc.1120.0794
  15. Garcia Martinez, C., Cerezo, R., Bermudez, M., & Romero, C. (2019). Improving essay peer grading accuracy in massive open online courses using personalized weights from student's engagement and performance. Journal of Computer Assisted Learning, 35(1), 110-120. https://doi.org/10.1111/jcal.12316
  16. Hunter, K., & Docherty, P. (2011). Reducing variation in the assessment of student writing. Assessment & Evaluation in Higher Education, 36(1), 109-124. https://doi.org/10.1080/02602930903215842
  17. Jeffery, D., Yankulov, K., Crerar, A., & Ritchie, K. (2016). How to achieve accurate peer assessment for high value written assignments in a senior undergraduate course. Assessment & Evaluation in Higher Education, 41(1), 127-140. https://doi.org/10.1080/02602938.2014.987721
  18. Kaufman, J. H., & Schunn, C. D. (2011). Students' perceptions about peer assessment for writing: Their origin and impact on revision work. Instructional science, 39, 387-406. https://doi.org/10.1007/s11251-010-9133-6
  19. Liu, N.-F., & Carless, D. (2006). Peer feedback: the learning element of peer assessment. Teaching in Higher education, 11(3), 279-290. https://doi.org/10.1080/13562510600680582
  20. Livingston, S. A. (2014). Equating test scores (without IRT). Educational testing service.
  21. Panadero, E., & Alqassab, M. (2019). An empirical review of anonymity effects in peer assessment, peer feedback, peer review, peer evaluation and peer grading. Assessment & Evaluation in Higher Education, 44(8), 1253-1278. https://doi.org/10.1080/02602938.2019.1600186
  22. Price, E., Goldberg, F., Robinson, S., & McKean, M. (2016). Validity of peer grading using Calibrated Peer Review in a guided-inquiry, conceptual physics course. Physical Review Physics Education Research, 12(2), 020145. https://doi.org/10.1103/PhysRevPhysEducRes.12.020145
  23. Rico-Juan, J. R., Cachero, C., & Macia, H. (2022). Influence of individual versus collaborative peer assessment on score accuracy and learning outcomes in higher education: an empirical study. Assessment & Evaluation in Higher Education, 47(4), 570-587. https://doi.org/10.1080/02602938.2021.1955090
  24. Russell, A. A. (2004). Calibrated peer review-a writing and critical-thinking instructional tool. Teaching Tips: Innovations in Undergraduate Science Instruction, 54.
  25. Russell, J., Van Horne, S., Ward, A. S., Bettis III, E., & Gikonyo, J. (2017). Variability in students' evaluating processes in peer assessment with calibrated peer review. Journal of Computer Assisted Learning, 33(2), 178-190. https://doi.org/10.1111/jcal.12176
  26. Seidel, T., Schnitzler, K., Kosel, C., Sturmer, K., & Holzberger, D. (2021). Student characteristics in the eyes of teachers: Differences between novice and expert teachers in judgment accuracy, observed behavioral cues, and gaze. Educational Psychology Review, 33, 69-89. https://doi.org/10.1007/s10648-020-09532-2
  27. Sol, J. (2016). Peer evaluation: Incentives and coworker relations. Journal of Economics & Management Strategy, 25(1), 56-76. https://doi.org/10.1111/jems.12134
  28. Steedly, K., Dragoo, K., Arafeh, S., & Luke, S. D. (2008). Effective Mathematics Instruction. Evidence for Education. Volume III, Issue I. National Dissemination Center for Children with Disabilities.
  29. Szyld, D., & Rudolph, J. W. (2013). Debriefing with good judgment. The comprehensive textbook of healthcare simulation, 85-93. https://doi.org/10.1007/978-1-4614-5993-4_7
  30. Topping, K. (1998). Peer assessment between students in colleges and universities. Review of educational Research, 68(3), 249-276. https://doi.org/10.3102/00346543068003249
  31. Topping, K. J. (2010). Peers as a source of formative assessment. Handbook of formative assessment, 61-74.
  32. Vough, H. C., & Caza, B. B. (2017). Where do I go from here? Sensemaking and the construction of growth-based stories in the wake of denied promotions. Academy of Management Review, 42(1), 103-128. https://doi.org/10.5465/amr.2013.0177
  33. Villeval, M. C. (2020). Performance feedback and peer effects (pp. 1-38). Springer International Publishing.
  34. Wang, T., Jing, X., Li, Q., Gao, J., & Tang, J. (2019). Improving Peer Assessment Accuracy by Incorporating Relative Peer Grades. International Educational Data Mining Society.
  35. Wisniewski, B., Zierer, K., & Hattie, J. (2020). The power of feedback revisited: A meta-analysis of educational feedback research. Frontiers in psychology, 10, 487662. https://doi.org/10.3389/fpsyg.2019.03087