DOI QR코드

DOI QR Code

Development of Machine Learning-based Construction Accident Prediction Model Using Structured and Unstructured Data of Construction Sites

건설현장 정형·비정형데이터를 활용한 기계학습 기반의 건설재해 예측 모델 개발

  • 조민건 (성균관대학교 미래도시융합공학과) ;
  • 이동환 (성균관대학교 미래도시융합공학과) ;
  • 박주영 (성균관대학교 건설환경시스템공학과) ;
  • 박승희 (성균관대학교 건설환경공학부)
  • Received : 2021.11.05
  • Accepted : 2021.12.16
  • Published : 2022.02.01

Abstract

Recently, policies and research to prevent increasing construction accidents have been actively conducted in the domestic construction industry. In previous studies, the prediction model developed to prevent construction accidents mainly used only structured data, so various characteristics of construction sites are not sufficiently considered. Therefore, in this study, we developed a machine learning-based construction accident prediction model that enables the characteristics of construction sites to be considered sufficiently by using both structured and text-type unstructured data. In this study, 6,826 cases of construction accident data were collected from the Construction Safety Management Integrated Information (CSI) for machine learning. The Decision forest algorithm and the BERT language model were used to train structured and unstructured data respectively. As a result of analysis using both types of data, it was confirmed that the prediction accuracy was 95.41 %, which is improved by about 20 % compared to the case of using only structured data. Conclusively, the performance of the predictive model was effectively improved by using the unstructured data together, and construction accidents can be expected to be reduced through more accurate prediction.

현재 국내 건설업에서는 꾸준히 증가하는 건설재해를 예방하기 위해 다양한 정책적 노력과 연구가 활발하게 진행되고 있다. 기존 연구에서 건설재해 예방을 위해 개발한 예측 모델의 경우, 주로 정형데이터만을 활용하였기에 건설현장의 다양한 특성을 충분히 고려하지 못한 예측 결과가 도출되었다. 따라서, 본 연구에서는 정형데이터와 텍스트 형식의 비정형데이터를 동시에 활용하여 건설현장의 특성을 충분히 고려할 수 있는 기계학습 기반 건설재해 사전 예측 모델을 개발하였다. 본 연구는 기계학습을 위해 건설공사 안전관리 종합정보망(CSI)의 최근 3년간 건설재해 데이터 6,826건을 수집하였다. 수집된 데이터 중 정형데이터의 학습은 5가지 알고리즘의 성능 분석을 통해 Decision forest 알고리즘을 사용하였고 비정형데이터의 학습은 BERT 언어모델을 사용하였다. 정형 및 비정형데이터를 동시에 활용한 건설재해 예측 모델의 성능 비교 결과, 정형데이터만을 활용한 경우보다 약 20 % 향상된 95.41 %의 예측정확도가 도출되었다. 본 연구 결과, 비정형데이터를 동시에 활용함으로써 예측 모델의 효과적인 성능 향상을 확인하였으며, 보다 정확한 예측을 통한 건설재해 저감을 기대할 수 있다.

Keywords

Acknowledgement

본 연구는 국토교통부/국토교통과학기술진흥원이 시행하고 한국도로공사가 총괄하는 "스마트건설기술개발 국가R&D사업(과제번호 21SMIP-A158708-02)"의 지원으로 수행되었으며, 국토교통부의 스마트시티 혁신인재육성사업으로 지원되었습니다. 본 논문은 2021 CONVENTION 논문을 수정·보완하여 작성되었습니다.

References

  1. Beautiful Soup (2020). Beautiful soup documentation, Available at: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (Accessed: June 25, 2020).
  2. Cho, J. H. (2012). "A study on the causes analysis and preventive measures by disaster types in construction fields." Journal of the Korea Safety Management & Science, Vol. 14, No. 1, pp. 7-13. https://doi.org/10.12812/KSMS.2012.14.1.007
  3. Cho, Y. R., Kim, Y. C. and Shin, Y. S. (2017). "Prediction model of construction safety accidents using decision tree technique." Journal of the Korea Institute of Building Construction, Vol 17, No. 3, pp. 295-303 (in Korean). https://doi.org/10.5345/JKIBC.2017.17.3.295
  4. Choi, S. J., Kim, J. H. and Jung, K. H. (2021). "Development of prediction models for fatal accidents using proactive information in construction sites." Journal of the Korean Society of Safety, Vol. 36, No. 3, pp. 31-39 (in Korean). https://doi.org/10.14346/JKOSOS.2021.36.3.31
  5. Choi, S. Y. (2020). Comparison analysis of deaths in construction industry in OECD countries, Construction & Economy Research Institute of Korea, pp. 13 (in Korean).
  6. Cortes, C. and Vapnik, V. (1995). "Support-vector networks." Machine Learning, Vol. 20, pp. 273-297. https://doi.org/10.1007/BF00994018
  7. Devlin, J., Chang, M. W., Lee, K. and Toutanova, K. (2019). "BERT: Pre-training of deep bidirectional transformers for language understanding." arXiv:1810.04805v2, pp. 1-16.
  8. Fisher, A., Rudin, C. and Dominici, F. (2019). "All models are wrong, but many are useful: learning a variable's importance by studying an entire class of prediction models simultaneously." arXiv:1801.01489v5, pp. 1-81.
  9. Ha, M. S. and Ahn, H. C. (2019). "A machine learning-based vocational training dropout prediction model considering structured and unstructured data." Journal of the Korea Contents Association, Vol. 19, No. 1, pp. 1-15. https://doi.org/10.5392/JKCA.2019.19.01.001
  10. Hoskins, J. C. and Himmelblau, D. M. (1992). "Process control via artificial neural networks and reinforcement learning." Computers & Chemical Engineering, Vol. 16, No. 4, pp. 241-251. https://doi.org/10.1016/0098-1354(92)80045-B
  11. Kim, B. S. (2008). "The appropriation and the use scheme of safety control cost for reducing severity rate of injury on construction." Journal of the Korean Society of Civil Engineers, KSCE, Vol. 28, No. 3D, pp. 383-390 (in Korean).
  12. Kim, Y. C., Yoo, W. S. and Shin, Y. S. (2017). "Application of artificial neural networks to prediction of construction safety accidents." Journal of the Korean Society of Hazard Mitigation, Vol. 17, No. 1, pp. 7-14 (in Korean). https://doi.org/10.9798/KOSHAM.2017.17.1.7
  13. Korea Labor Institute (KLI) (2013). Construction industry accident status analysis and policy direction, pp. 31 (in Korean).
  14. Korea Occupational Safety and Health Agency (KOSHA) (2019). 2019 Large accident report book, pp. 9 (in Korean).
  15. Lee, C. H., Lee, Y. J. and Lee, D. H. (2020). "A study of fine tuning pre-trained korean BERT for question answering performance development." Journal of Information Technology Services, Vol. 19, No. 5, pp. 83-91 (in Korean). https://doi.org/10.9716/KITS.2020.19.5.083
  16. Lee, S. G. (2018). "A study on the trends of construction safety accident in unstructured text using topic modeling." Journal of the Korea Academia-Industrial Cooperation Society, Vol. 19, No. 10, pp. 176-182 (in Korean).
  17. Lim, W. J., Kee, J. H., Seong, J. H. and Park, J. Y. (2019). "Development of accident cause analysis model for construction site." Journal of the Korean Society of Safety, Vol. 34, No. 1, pp. 45-52 (in Korean). https://doi.org/10.14346/JKOSOS.2019.34.1.45
  18. Ministry of Employment and Labor (MOEL) (2020). 2019 Industrial accident analysis of current situation, pp. 32 (in Korean).
  19. Park, K. C. and Kim, H. K. (2021). "Analysis of seasonal importance of construction hazards using text mining." KSCE Journal of Civil and Environmental Engineering Research, KSCE, Vol. 41, No. 3, pp. 305-316 (in Korean). https://doi.org/10.12652/KSCE.2021.41.3.0305
  20. Raschka, S. (2018). "Model evaluation, model selection, and algorithm selection in machine learning." arXiv:1801.01489v5, pp. 1-45.
  21. Rokach, L. (2016). "Decision forest: Twenty years of research." Information Fusion, Vol. 27, pp. 111-125. https://doi.org/10.1016/j.inffus.2015.06.005
  22. Shanker, M., Hu, M. Y. and Hung, M. S. (1996). "Effect of data standardization on neural network training." The International Journal of Management Science, Vol. 24, No. 4, pp. 385-397.
  23. Sokolova, M. and Lapalme, G. (2009). "A systematic analysis of performance measures for classification tasks." Information Processing and Management, Vol. 45, No. 4, pp. 427-437. https://doi.org/10.1016/j.ipm.2009.03.002
  24. Sperandei, S. (2014). "Understanding logistic regression analysis." Biochemia Medica, Vol. 24, No. 1, pp. 12-18. https://doi.org/10.11613/BM.2014.003
  25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I. (2017). "Attention is all you need." arXiv:1706.03762v5, pp. 1-15.
  26. Woo, D. C., Moon, H. S., Kwon, S. B. and Cho, Y. H. (2019). "A deep learning application for automated feature extraction in transaction-based machine learning." Journal of Information Technology Service, Vol. 18, No. 2, pp. 143-159. https://doi.org/10.9716/KITS.2019.18.2.143
  27. Yu, Y. J., Kim, T. H., Son, K. Y., Lee, K. H. and Kim, J. M. (2016). "Analysis of primary internal and external risk factors according to the accident causes in construction site." Journal of the Korea Institute of Building Construction, Vol. 16, No. 6, pp. 519-527 (in Korean). https://doi.org/10.5345/JKIBC.2016.16.6.519
  28. Zhang, F., Fleyeh, H., Wang, X. and Lu, M. (2019). "Construction site accident analysis using text mining and natural language processing techniques." Automation in Construction, Vol. 99, pp. 238-248. https://doi.org/10.1016/j.autcon.2018.12.016
  29. Zhang, H. (2004). The optimality of naive bayes, American Association for Artificial Intelligence, USA, pp. 1-6.