DOI QR코드

DOI QR Code

딥러닝 기반 분류 모델의 성능 분석을 통한 건설 재해사례 텍스트 데이터의 효율적 관리방향 제안

A Suggestion of the Direction of Construction Disaster Document Management through Text Data Classification Model based on Deep Learning

  • 김하영 (이화여자대학교 건축도시시스템공학과) ;
  • 장예은 (이화여자대학교 건축도시시스템공학과) ;
  • 강현빈 (이화여자대학교 건축도시시스템공학과) ;
  • 손정욱 (이화여자대학교 건축도시시스템공학과) ;
  • 이준성 (이화여자대학교 건축도시시스템공학과)
  • Kim, Hayoung (Department of Architectural and Urban Systems Engineering, Ewha Womans University) ;
  • Jang, YeEun (Department of Architectural and Urban Systems Engineering, Ewha Womans University) ;
  • Kang, HyunBin (Department of Architectural and Urban Systems Engineering, Ewha Womans University) ;
  • Son, JeongWook (Department of Architectural and Urban Systems Engineering, Ewha Womans University) ;
  • Yi, June-Seong (Department of Architectural and Urban Systems Engineering, Ewha Womans University)
  • 투고 : 2021.04.09
  • 심사 : 2021.09.02
  • 발행 : 2021.09.30

초록

본 연구는 딥러닝 기반의 텍스트 데이터 분류 모델의 성능 고찰을 통해 한국어 건설 재해사례의 효율적 관리방향을 제안한다. 이를 위해 비정형 텍스트 문서인 건설 재해 보고서를 활용해 건설 사고의 대표적 유형인 추락, 감전, 낙하, 붕괴, 협착의 5개 범주로 분류하는 딥러닝 모델을 구현하였다. 초기 모델 테스트 결과, 추락 재해의 분류 정확도가 상대적으로 높게 도출되며 타 유형을 추락 재해로 분류하는 경우가 많이 발생한다는 특징이 나타났다. 원인 분석 결과, 1) 구체적인 사고 유발 행동, 2) 유사한 문장 구조, 3) 여러 유형에 해당되는 복합사고가 위의 특징에 영향을 미치는 것으로 분석되었으며, 이 중 추가 실험을 통해 검증이 가능한 복합사고에 대한 두 가지 정확도 개선 실험을 진행하였다: 1) 재분류, 2) 제외. 실험 결과, 복합사고 제외 시 분류 성능이 185.7% 향상되었으며, 이를 통해 여러 사고 유형에 대한 내용을 동시에 포함하는 복합사고의 다중공선성(multicollinearity)이 해소되었음을 알 수 있다. 결론적으로 본 연구에서는 향후 사고에 대한 상황을 상세히 서술하는 체계를 마련함과 동시에 복합사고를 독립적으로 관리할 필요성을 시사한다.

This study proposes an efficient management direction for Korean construction accident cases through a deep learning-based text data classification model. A deep learning model was developed, which categorizes five categories of construction accidents: fall, electric shock, flying object, collapse, and narrowness, which are representative accident types of KOSHA. After initial model tests, the classification accuracy of fall disasters was relatively high, while other types were classified as fall disasters. Through these results, it was analyzed that 1) specific accident-causing behavior, 2) similar sentence structure, and 3) complex accidents corresponding to multiple types affect the results. Two accuracy improvement experiments were then conducted: 1) reclassification, 2) elimination. As a result, the classification performance improved with 185.7% when eliminating complex accidents. Through this, the multicollinearity of complex accidents, including the contents of multiple accident types, was resolved. In conclusion, this study suggests the necessity to independently manage complex accidents while preparing a system to describe the situation of future accidents in detail.

키워드

과제정보

이 연구는 국토교통부/국토교통과학기술진흥원의 지원으로 수행되었음(과제번호 21CTAP-C152263-03).

참고문헌

  1. Abdelhamid, T.S., and Everett, J.G. (2000). "Identifying root causes of construction accidents." Journal of construction engineering and management, 126(1), pp. 52-60. https://doi.org/10.1061/(ASCE)0733-9364(2000)126:1(52)
  2. Abramovich, F., and Pensky, M. (2019). "Classification with many classes: challenges and pluses." Journal of Multivariate Analysis, 174, p. 104536. https://doi.org/10.1016/j.jmva.2019.104536
  3. Ahuja, V., Yang, J., and Shankar, R. (2010). "Benchmarking framework to measure extent of ict adoption for building project management." Journal of Construction Engineering and Management, 136(5), pp. 538-545. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000155
  4. AI Qady, M., and Kandil, A. (2015). "Automatic Classification of Project Documents on the Basis of Text Content." Journal of Computing in Civil Engineering, 29(3), p. 04014043. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000338
  5. Amiri M., Ardeshir, A., Fazel Zarandi, M.H., and Soltanaghaei, E. (2016). "Pattern extraction for high-risk accidents in the construction industry: a datamining approach." International Journal of Injury Control and Safety Promotion, 23(3), pp. 264-276. https://doi.org/10.1080/17457300.2015.1032979
  6. Caldas, C.H., Soibelman, L., and Han, J. (2002). "Automated Classification of Construction Project Documents." Journal of Computing in Civil Engineering, 16(4), pp. 234-243. https://doi.org/10.1061/(ASCE)0887-3801(2002)16:4(234)
  7. CERIK (2014). CERIK Research Report, 2014.
  8. Chassiakos, A., and Sakellaropoulos, S. (2008). "A webbased system for managing construction information." Advances in Engineering Software, 39(11), pp. 865-876. https://doi.org/10.1016/j.advengsoft.2008.05.006
  9. Chi, N.W., Lin, K.Y., El-Gohary, N., and Hsieh, S.H. (2016). "Evaluating the strength of text classification categories for supporting construction field inspection." Automation in Construction, 64, pp. 78-88. https://doi.org/10.1016/j.autcon.2016.01.001
  10. Choi, J.K. (2019). "A Prediction Model for Fatal Accidents among Construction Workers using Machine Learning." MS thesis, Sungkyunkwan Univ., Korea.
  11. Chokor, A., Naganathan, H., Chong, W.K., and Asmar, M.E. (2016). "Analyzing Arizona OSHA Injury Reports Using Unsupervised Machine Learning." Procedia Engineering, 145, pp. 1588-1693. https://doi.org/10.1016/j.proeng.2016.04.200
  12. Cui, T., Wu, Y., and Tong, Y. (2018). "Exploring ideation and implementation openness in open innovation projects: IT-enabled absorptive capacity perspective." Information & Management, 55(5), pp. 576-587. https://doi.org/10.1016/j.im.2017.12.002
  13. Famous, G. (2018). "Three Technology Trends Shaping the Future of Design and Construction in 2018." Aconex Group, (Feb. 20, 2021)
  14. Heinrich, H.W. (1941). Industrial Accident Prevention: A Scientific Approach, 2nd ed.
  15. Hill, B.L. (2017). "Digging for the Big Data Gold in Today' s Construction Projects." Xpera Group, (Feb. 20, 2021)
  16. IBM (2015). IBM Annual Report, 2015.
  17. Jung, J.M. (2018). "A study of improvement of deep learning performance for document classification using the word class." MS thesis, Korea Univ., Korea.
  18. Kale, O.A., and Baradan, S. (2020). "Identifying factors that contribute to severity of construction injuries using logistic regression model." Teknik Dergi, 31(2), pp. 9919-9940.
  19. Kang, H.B., and Yi, J.S. (2018). "An Analysis of Public Text Data in Construction Disaster Cases using Word2Vec-based Data Visualization." Proceedings of the 2018 Architectural Institute of Korea Conference, 38(2), pp. 567-570.
  20. Khallaf, R., and Khallaf, M. (2021). "Classification and analysis of deep learning applications in construction: A systematic literature review." Automation in Construction, 129, p. 103760. https://doi.org/10.1016/j.autcon.2021.103760
  21. Kim, D.C., and Kim, H.J. (2001). "A Plan of the Accident Classification System for the Analysis of Disaster Information in Construction Projects." Journal of the Architectural Institute of Korea: Structure & Construction, 17(11), pp. 139-145.
  22. Kim, Y. (2014). "Convolutional Neural Networks for Sentence Classification." Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP. pp. 1746-1751.
  23. Kim, Y.C. (2017). "A study on safety accident prediction using data mining technique at domestic construction site." MS thesis, Kyonggi Univ., Korea.
  24. Kim, Y.C., Yoo, W.S., and Shin, Y. (2017). "Application of Artificial Neural Networks to Prediction of Construction Safety Accidents." Journal of the Korean Society of Hazard and Mitigation, 17(1), pp. 7-14. https://doi.org/10.9798/KOSHAM.2017.17.1.7
  25. KOSHA (2020). Statistics of Industrial accident 2019, 2020.
  26. Levy, O. and Goldberg, Y. (2014). "Neural word embedding as implicit matrix factorization." In Advances in neural information processing systems, pp. 2177-2185.
  27. Liddy, E.D. (2001). Natural language processing Encyclopedia of Library and Information Science. NY: Marcel Decker. Inc.
  28. Low, B.K.L., Man, S.S., and Chan, A.H.S. (2018). "The risk-taking propensity of construction workers-An application of Quasi-expert interview." International journal of environmental research and public health, 15(10), p. 2250. https://doi.org/10.3390/ijerph15102250
  29. Martinez-Rojas, M., Marin, N., and Vila, M.A. (2013). "A preliminary approach to classify work descriptions in construction projects." IFSA World Congress and NAFIPS Annual Meeting, 2013 Joint, IEEE, Washington, DC, pp. 1090-1095.
  30. Marzouk, M., and Enaba, M. (2019). "Text analytics to analyze and monitor construction project contract and correspondence." Automation in Construction, 98, pp. 265-274. https://doi.org/10.1016/j.autcon.2018.11.018
  31. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). "Distributed Representations of Words and Phrases and their Compositionality." Advances in Neural Information Processing Systems, pp. 3111-3119.
  32. Moon, S., Kim, T., Hwang, B.G., and Chi, S. (2018). "Analysis of construction accidents based on semantic search and natural language processing." 2018 International Symposium on Automation and Robotics in Construction and International AEC/FM Hackathon: The Future of Building Things.
  33. Moon, S., Kim, T., Hwang, B.G., and Chi, S. (2018). "Document Management System Using Text Mining for Information Acquisition of International Construction." Journal of Civil Engineering, KSCE, 22(12), pp. 4791-4798.
  34. Nam, G.l., and Jo, E.G. (2017). Korean Text Sentiment Analysis, Communication-Books.
  35. Ok, H. Kim, S.J., and Seo, M.B. (2013). "A Study on the Improvement of the Domestic Construction Information Classification System." Proceedings of the 2013 Korean Institute of Information Scientists and Engineers Conference, pp. 25-27.
  36. Park, E.J., and Cho, S.Z. (2014). "KoNLPy: Korean natural language processing in Python." 26th Annual Conference on Human and Language Technology, pp. 133-136.
  37. Park, H.J, Song, M.C., and Shin, K.S. (2018). "Sentiment Analysis of Korean Reviews Using CNN - Focusing on Morpheme Embedding -." Journal of Intelligence and Information Systems, (24)2, pp. 59-83.
  38. Park. T.Y, Han H.J., Kim, Y., and Kim, S.J. (2017). "A Study on the Analysis and Improvement of Classifications for Integrated Management of Disaster and Safety Information." Korean Biblia Society for Library and Information Science, 28(3), pp. 125-150.
  39. Sardroud, J.M. (2015). "Perceptions of automated data collection technology use in the construction industry." Journal of Civil Engineering and Management, 21(1), pp. 54-66. https://doi.org/10.3846/13923730.2013.802734
  40. Sleeman IV, W.C., and Krawczyk, B. (2021). "Multi-class imbalanced big data classification on Spark." Knowledge-Based Systems, 212, p. 106598. https://doi.org/10.1016/j.knosys.2020.106598
  41. Soibelman, L., Wu, J., Caldas, C., Brilakis, I., and Lin, K.Y. (2008). "Management and analysis of unstructured construction data types." Advanced Engineering Informatics, 22(1), pp. 15-27. https://doi.org/10.1016/j.aei.2007.08.011
  42. Stanton, W.A., and Willenbrock, J.H. (1990). "Conceptual framework for computer-based, construction safety control." Journal of Construction Engineering and Management, 116(3), pp. 383-398. https://doi.org/10.1061/(ASCE)0733-9364(1990)116:3(383)
  43. Tixier, A.J.-P., Hallowell, M.R., Rajagopalan, B., and Bowman, D. (2016). "Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports." Automation in Construction, 62, pp. 45-56. https://doi.org/10.1016/j.autcon.2015.11.001
  44. Ubeynarayana, C.U., and Goh, Y.M. (2017). "An Ensemble Approach for Classification of Accident Narratives". ASCE International Workshop on Computing in Civil Engineering 2017, pp. 409-416.
  45. Villanueva, V., and Garcia, A.M. (2011). "Individual and occupational factors related to fatal occupational injuries: a case-control study." Accident Analysis and Prevention, 43(1), pp. 123-127. https://doi.org/10.1016/j.aap.2010.08.001
  46. Williams, T.P., and Gong, J. (2014). "Predicting construction cost overruns using text mining, numerical data and ensemble classifiers." Automation in Construction, 43, pp. 23-29. https://doi.org/10.1016/j.autcon.2014.02.014
  47. Yang Y.S., Park J.H., and Lee C.S. (2009). "Accident Risk Analysis of Construction Workers by Occupation." Journal of the Architectural Institute of Korea: Structure & Construction, 25(10), pp. 149-156.
  48. Yi, K.J. (2005). "Construction Workers' Occupational Risk of On-Site Travelling Activities." Korean Journal of Construction Engineering and Management, KICEM, 6(3), pp. 120-127.
  49. You, Z., and Wu, C. (2019). "A framework for data-driven informatization of the construction company." Advanced Engineering Informatics, 39, pp. 269-277. https://doi.org/10.1016/j.aei.2019.02.002
  50. Zhong, B., Pan, X., Love, P. E., Ding, L., and Fang, W. (2020). "Deep learning and network analysis: Classifying and visualizing accident narratives in construction." Automation in Construction, 113, p. 103089. https://doi.org/10.1016/j.autcon.2020.103089
  51. Zou, Y., Kiviniemi, A., and Jones, S.W. (2017). "Retrieving similar cases for construction project risk management using Natural Language Processing techniques." Automation in construction, 80, pp. 66-76. https://doi.org/10.1016/j.autcon.2017.04.003