DOI QR코드

DOI QR Code

Matching prediction on Korean professional volleyball league

한국 프로배구 연맹의 경기 예측 및 영향요인 분석

  • Heesook Kim (Department of Statistics, Ewha Womans University) ;
  • Nakyung Lee (Department of Statistics, Ewha Womans University) ;
  • Jiyoon Lee (Department of Statistics, Ewha Womans University) ;
  • Jongwoo Song (Department of Statistics, Ewha Womans University)
  • 김희숙 (이화여자대학교 통계학과) ;
  • 이나경 (이화여자대학교 통계학과) ;
  • 이지윤 (이화여자대학교 통계학과) ;
  • 송종우 (이화여자대학교 통계학과)
  • Received : 2023.09.03
  • Accepted : 2023.11.01
  • Published : 2024.06.30

Abstract

This study analyzes the Korean professional volleyball league and predict match outcomes using popular machine learning classification methods. Match data from the 2012/2013 to 2022/2023 seasons for both male and female leagues were collected, including match details. Two different data structures were applied to the models: Separating matches results into two teams and performance differentials between the home and away teams. These two data structures were applied to construct a total of four predictive models, encompassing both male and female leagues. As specific variable values used in the models are unavailable before the end of matches, the results of the most recent 3 to 4 matches, up until just before today's match, were preprocessed and utilized as variables. Logistc Regrssion, Decision Tree, Bagging, Random Forest, Xgboost, Adaboost, and Light GBM, were employed for classification, and the model employing Random Forest showed the highest predictive performance. The results indicated that while significant variables varied by gender and data structure, set success rate, blocking points scored, and the number of faults were consistently crucial. Notably, our win-loss prediction model's distinctiveness lies in its ability to provide pre-match forecasts rather than post-event predictions.

본 연구는 한국 프로배구 리그를 체계적으로 분석하고 대표적인 머신러닝 분류 방법을 활용하여 경기 결과를 예측하고자 한다. 이를 위해 2012/2013 시즌부터 2022/2023 시즌까지의 남자 프로배구와 여자 프로배구 리그 경기 데이터를 수집하였으며, 이 데이터는 경기 세부 내용을 상세하게 포함하고 있다. 데이터는 각 경기를 두 팀으로 분리한 경우와 홈팀을 기준으로 상대팀과의 성과 차이로 데이터를 가공한 경우로 두 가지 다른 데이터 구조를 모델에 적용했다. 이를 통해 남자 프로배구와 여자 프로배구 각각에 대해 총 4개의 예측 모형을 구축했다. 경기 종료 전에는 모형에서 사용하는 세부 변수 값들을 알 수 없기 때문에, 오늘 경기 직전까지의 3~4 경기의 결과를 전처리하여 이를 변수로 사용했다. 본 연구에서는 Decision Tree, Logistic Regression, Bagging, Random Forest, Xgboost, Adaboost, Light GBM 같은 다양한 머신러닝 기법을 분류에 활용하여, Random Forest를 사용한 모델이 가장 우수한 예측 성능을 보였다. 최종 선택한 모형에 대해 변수 중요도 그림과 부분 의존도 그림을 확인한 결과 성별과 데이터 구조에 따라 중요한 변수들이 다른 것으로 나타났지만, 공통적으로 세트 성공 수, 블로킹 득점, 범실 개수가 가장 중요한 변수임을 알 수 있었다. 본 승패 예측 모델은 사후적 예측이 아닌 경기 종료 전 사전 예측이 가능한 모형이라는 점에서 차별성을 가지며, 우리의 분석이 한국 프로배구 팀들에게 전략적 추론이 될 수 있을 것이라 기대한다.

Keywords

References

  1. Baacke Η(1982). Statistical match analysis for evaluation of players and teams performances, Volleyball Technical Journal, 7, 45-56.
  2. Breiman L (1996). Bagging predictors, Machine Learning, 24, 123-140. https://doi.org/10.1007/BF00058655
  3. Breiman L, Friedman J, Olshen R, and Stone C (1984). Classification and Regression Trees, Chapman and Hall, New York.
  4. Breiman L (2001). Random forests, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  5. Chen T and Guestrin C (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, California, 785-794.
  6. Cho SR (2017). Analysis of volleyball serve technique success rate (Master's thesis), Mokpo University, Mokpo.
  7. Chun Y-J and Kim K-T (2011). Analysis on contents and changes of female professional volleyball's score in rally point scoring system game, Korean Journal of Sports Science, 20, 729-737.
  8. Chun Y-J and Kim K-T (2011). Analysis on contents and changes of female professional volleyball's score in rally point scoring system game, Korean Journal of Sports Science, 20, 729-737.
  9. Eom HJ and Schutz RW (1992). Statistical analyses of volleyball team performance, Research Quarterly for Exercise and Sport, 63, 11-18. https://doi.org/10.1080/02701367.1992.10607551
  10. Eom HJ, Jo JH, and Sin SY (2002). Application Cases of Statistical Models in Professional Sports Settings. In Proceedings of the Korean Statistical Society Conference, 51-59.
  11. Eom HJ and Schutz RW (1992). Transition play in team performance of volleyball : A log-linear analysis, Research Quarterly for Exercise and Sport, 63, 261-269. https://doi.org/10.1080/02701367.1992.10608741
  12. Friedman J, Hastie T, and Tibshirani R (2000). "Additive logistic regression: A statistical view of boosting.", The Annals of Statistics, 28, 337-407. https://doi.org/10.1214/aos/1016218223
  13. Hastie TJ and Pregibon D (1992). Generalized linear models. In Chambers JM and Hastie TJ (Eds), Statistical Models in S, Wadsworth and Brooks/Cole, Pacific Grove.
  14. Heo C-K and Yoon J-D (2023). Bayesian Bradley-Terry with MCMC for the prediction of volleyball results, Korean Journal of Sports Science, 32, 813-823. https://doi.org/10.35159/kjss.2023.04.32.2.813
  15. Hong SJ, Lee KC, Kim WK, and Jang JH (2011). The development of record factor norm for evaluating tennis players, The Korean Journal of Measurement and Evaluation in Physical Education and Sports Science, 13, 89-101. https://doi.org/10.21797/ksme.2011.13.3.008
  16. Hughes M and Franks IM (1997). Notational Analysis of Sport, E & FN SPON, London.
  17. Ji MJ (2014). A study on the match fixing case of domestic professional sports, Journal of Korea Entertainment Industry Association, 31, 109-116. https://doi.org/10.21184/jkeia.2014.09.8.3.109
  18. Jo HM (1999). The impact of setter's position-based toss type and attack success rate on volleyball match outcomes (Master's thesis), College of Education, Kyung Sung University, Seoul.
  19. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, and Liu TY (2017). Lightgbm: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems, 30, 3146-3154.
  20. Kim KW (2007). A comparative analysis of spike success rates by court and zone for high school boys' volleyball teams (Master's thesis), Mokpo University, College of Education, Mokpo.
  21. Kim J, Kim JH, Kim J, and Hong SJ (2011). An analysis of different attacks regarding to serve receive between teams in volleyball, Journal of Sport and Science, 22, 2119-2131. https://doi.org/10.24985/kjss.2011.22.3.2119
  22. Kim S (2009). A comparative analysis of success, failure, and scoring rates in male volleyball matches by serve type (Master's thesis), Busan National University of Education, Busan.
  23. Kwon T-W, Cho S-W, and Cho Y-H (1998). The technical analysis on 's volleyball game -with the focus of women's team-, Korean Journal of Sports Science, 7, 425-431.
  24. McCullagh P and Nelder JA (1989). Generalized Linear Models, 37, CRC press, Boca Raton, Florida.
  25. Shin SH (2017). The analysis of the attack type on the professional women's volleyball (Doctoral dissertation), Hanyang University, Seoul.