DOI QR코드

DOI QR Code

Development of SVM-based Construction Project Document Classification Model to Derive Construction Risk

건설 리스크 도출을 위한 SVM 기반의 건설프로젝트 문서 분류 모델 개발

  • 강동욱 (성균관대학교 글로벌스마트시티융합공학과) ;
  • 조민건 (성균관대학교 글로벌스마트시티융합공학과) ;
  • 차기춘 (성균관대학교 글로벌스마트시티융합공학과) ;
  • 박승희 (성균관대학교 건설환경공학부)
  • Received : 2023.10.13
  • Accepted : 2023.10.30
  • Published : 2023.12.01

Abstract

Construction projects have risks due to various factors such as construction delays and construction accidents. Based on these construction risks, the method of calculating the construction period of the construction project is mainly made by subjective judgment that relies on supervisor experience. In addition, unreasonable shortening construction to meet construction project schedules delayed by construction delays and construction disasters causes negative consequences such as poor construction, and economic losses are caused by the absence of infrastructure due to delayed schedules. Data-based scientific approaches and statistical analysis are needed to solve the risks of such construction projects. Data collected in actual construction projects is stored in unstructured text, so to apply data-based risks, data pre-processing involves a lot of manpower and cost, so basic data through a data classification model using text mining is required. Therefore, in this study, a document-based data generation classification model for risk management was developed through a data classification model based on SVM (Support Vector Machine) by collecting construction project documents and utilizing text mining. Through quantitative analysis through future research results, it is expected that risk management will be possible by being used as efficient and objective basic data for construction project process management.

건설프로젝트는 공기 지연, 건설 재해 등 다양한 요인으로 인한 리스크가 존재한다. 이러한 건설 리스크를 기반으로 건설프로젝트의 공사 기간의 산정 방법은 주로 감독자 경험에 의존한 주관적 판단으로 이루어지고 있다. 또한, 공기 지연과 건설 재해로 지연된 건설프로젝트 일정을 맞추기 위한 무리한 단축 시공은 부실시공 등의 부정적인 결과를 초래하며, 지연된 일정으로 인한 사회 기반 시설물 부재로 경제적 손실이 발생한다. 이러한 건설프로젝트의 리스크 해결을 위한 데이터 기반의 과학적 접근과 통계적 분석이 필요한 실정이다. 실제 건설프로젝트에서 수집되는 데이터는 비정형 텍스트 형태로 저장되어 있어 데이터를 기반으로 한 리스크를 적용하기 위해서는 데이터 전처리에 많은 인력과 비용을 수반하기 때문에 텍스트 마이닝을 활용한 데이터 분류 모델을 통한 기초자료를 요구한다. 따라서, 본 연구에서는 건설프로젝트 문서를 수집하여 텍스트 마이닝을 활용하여 SVM(Support Vector Machine) 기반의 데이터 분류 모델을 통해 리스크 관리를 위한 문서 기초자료 생성 분류 모델을 개발하였다. 향후 연구 결과를 통해 정량적인 분석을 통해서 건설프로젝트 공정관리 등에 있어 효율적이고 객관적인 기초자료로 활용되어 리스크 관리가 가능해질 것으로 기대된다.

Keywords

Acknowledgement

This research was conducted with the support of the "National R&D Project for Smart Construction Technology (22SMIP-A158708-03)" funded by the Korea Agency for Infrastructure Technology Advancement under the Ministry of Land, Infrastructure and Transport, and managed by the Korea Expressway Corporation. and This research was supported by RS-2023-00248092 of Technology Development Program on Disaster Restoration Capacity Building and Strenthening funded by Ministry of Interior and Safety(MOIS, Korea).

References

  1. Aliramezani, M., Koch, C. R. and Shahbakhti, M. (2022). "Modeling, diagnostics, optimization, and control of internal combustion engines via modern machine learning techniques: A review and future directions." Progress in Energy and Combustion Science, Elsevier, Vol. 88, 100967, https://doi.org/10.1016/j.pecs.2021.100967.
  2. Al-Refaie, A. M., Alashwal, A. M., Abdul-Samad, Z. and Salleh, H. (2021). "Weather and labor productivity in construction: a literature review and taxonomy of studies." International Journal of Productivity and Performance Management, Emerald Publishing Limited, Vol. 70, No. 4, pp. 941-957, https://doi.org/10.1108/IJPPM-12-2019-0577 .
  3. Bressan, T. S., de Souza, M. K., Girelli, T. J. and Junior, F. C. (2020). "Evaluation of machine learning methods for lithology classification using geophysical data." Computers and Geosciences, Elsevier, Vol. 139, 104475, https://doi.org/10.1016/j.cageo.2020.104475.
  4. Choi, S. Y. (2020). "Comparison analysis of deaths in construction industry in OECD countries." Construction and Economy Research Institute of Korea (in Korean).
  5. Choi, S. J., Kim, J. H. and Jung, K. (2021). "Development of prediction models for fatal accidents using proactive information in construction sites." Journal of the Korean Society of Safety, KOSOS, Vol. 36, No. 3, pp. 31-39, https://doi.org/10.14346/JKOSOS.2021.36.3.31 (in Korean).
  6. Christian, H., Agus, M. P. and Suhartono, D. (2016). "Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF)." ComTech: Computer, Mathematics and Engineering Applications, Binus University, Vol. 7, No. 4, pp. 285-294, https://doi.org/10.21512/comtech.v7i4.3746.
  7. Chun, Y. G., Lee, G. U., Kim, Y. T. and Hyun, C. T. (2001). "A study on the estimation standard of optimal construction duration for reinforced concrete apartment house." Proceedings of 2nd Korea Institute of Construction Engineering and Management, KICEM, Seoul, Korea, pp. 531-534 (in Korean).
  8. Clegg, S., Killen, C. P., Biesenthal, C. and Sankaran, S. (2018). "Practices, projects and portfolios: Current research trends and new directions." International Journal of Project Management, Elsevier, Vol. 36, No. 5, pp. 762-772, https://doi.org/10.1016/j.ijproman.2018.03.008.
  9. Durdyev, S. and Hosseini, M. R. (2020). "Causes of delays on construction projects: a comprehensive list." International Journal of Managing Projects in Business, Emerald Publishing Limited, Vol. 13, No. 1, pp. 20-46, https://doi.org/10.1108/IJMPB-09-2018-0178.
  10. El-Sayegh, S. M., Manjikian, S., Ibrahim, A., Abouelyousr, A. and Jabbour, R. (2021). "Risk identification and assessment in sustainable construction projects in the UAE." International Journal of Construction Management, Taylor & Francis, Vol. 21, No. 4, pp. 327-336, https://doi.org/10.1080/15623599.2018.1536963.
  11. Glenigan (2015). UK Industry Performance Report.
  12. Ha, S. G., Kim, T. H., Son, K. Y., Kim, J. M. and Son, S. H. (2018). "Quantification model development of human accidents on external construction site by applying probabilistic method." Journal of the Korea Institute of Building Construction, KIC, Vol. 18, No. 6, pp. 611-619, https://doi.org/10.5345/JKIBC.2018.18.6.611 (in Korean).
  13. Islam, M. S., Nepal, M. P., Skitmore, M. and Kabir, G. (2019). "A knowledge-based expert system to assess power plant project cost overrun risks." Expert Systems with Applications, Elsevier, Vol. 136, No. 1, pp. 12-32, https://doi.org/10.1016/j.eswa.2019.06.030.
  14. Jitpaiboon, T., Smith, S. M. and Gu, Q. (2019). "Critical success factors affecting project performance: An analysis of tools, practices, and managerial support." Project Management Journal, PMI, Vol. 50, No. 3, pp. 271-287, https://doi.org/10.1177/8756972819833545.
  15. Jo, J. H. (2012). "A study on the causes analysis and preventive measures by disaster types in construction fields." Journal of the Korea Safety Management & Science, Vol. 14, No. 1, pp. 7-13, https://doi.org/10.12812/KSMS.2012.14.1.007 (in Korean).
  16. Kang, S. H., Jung, Y. S., Kim, S. R., Lee, I. H., Lee, C. W. and Jung, J. H. (2017). "Preliminary scheduling based on historical and experience data for airport project." Korea Institute of Construction Engineering and Management, KICEM, Vol. 18, No. 6, pp. 26-37, https://doi.org/10.6106/KJCEM.2017.18.6.026(in Korean).
  17. Kim, J. S. (2022). Analysis of project delay using big data. Msc. thesis, Hanyang University, Seoul, Korea (in Korean).
  18. Kim, S., Chang, S. and Castro-Lacouture, D. (2020). "Dynamic modeling for analyzing impacts of skilled labor shortage on construction project management." Journal of Management in Engineering, ASCE, Vol. 36, No. 1, 04019035, https://doi.org/10.1061/(ASCE)ME.1943-5479.0000720.
  19. Koppenjan, J., Veeneman, W., van der Voort, H., ten Heuvelhof, E. and Leijten, M. (2011). "Competing management approaches in large engineering projects: The Dutch RandstadRail project." International Journal of Project Management, Vol. 29, No. 6, pp. 740-750, https://doi.org/10.1016/j.ijproman.2010.07.003.
  20. Korea Institute of Civil engineering and building Technology (KICT) (2020). Guidelines for ensuring adequate construction time (in Korean).
  21. Korea Occupational Safety and Health Agency (KOSHA) (2019). 2019 Large accident report book, pp. 9 (in Korean).
  22. Lee, W. J. (2022). "A study on the use of stopword corpus for cleansing unstructured text data." The Journal of the Convergence on Culture Technology, IPACT, Vol. 8, No. 6, pp. 891-897, https://doi.org/10.17703/JCCT.2022.8.6.891 (in Korean).
  23. Lee, G. S. (2023). Methods to analyze the delays for extension of time claims. Msc. thesis, Hanyang University, Seoul, Korea (in Korean).
  24. Lee, J. H., Lee, M. B. and Kim, J. W. (2019). "A study on Korean language processing using TF-IDF." The Journal of Information Systems, KAIS, Vol. 28, No. 3, pp. 105-121, https://doi.org/10.5859/KAIS.2019.28.3.105 (in Korean).
  25. Li, W., Yin, Y., Quan, X. and Zhang, H. (2019). "Gene expression value prediction based on XGBoost algorithm." Frontiers in genetics, Frontiers, Vol. 10, 1077, https://doi.org/10.3389/fgene.2019.01077.
  26. Mulholland, B. and Christian, J. (1999). "Risk assessment in construction schedules." Journal of Construction Engineering and Management, Vol. 125, No. 1, pp. 8-15, https://doi.org/10.1061/(ASCE)0733-9364(1999)125:1(8).
  27. Natekin, A. and Knoll, A. (2013). "Gradient boosting machines, a tutorial." Frontiers in Neurorobotics, Frontiers, Vol. 7, https://doi.org/10.3389/fnbot.2013.00021.
  28. Normawati, D. and Ismi, D. P. (2019). "K-fold cross validation for selection of cardiovascular disease diagnosis features by applying rule-based datamining." Signal and Image Processing Letters, ASCEE, Vol. 1, No. 2, pp. 62-72, https://doi.org/10.31763/simple.v1i2.3.
  29. Oxford Economics (2017). Global Infrastructure Outlook.
  30. Park, G. S. (2012). A study on the estimation of an approproate construction duration of apartment. Msc. thesis, Seoul National University of Science and Technology (in Korean).
  31. Park, J. H., Cho, M. G., Eom, S. H. and Park, S. K. (2023). "Quantification of schedule delay risk of rain via text mining of a construction log." KSCE Journal of Civil and Environmental Engineering Research, KSCE, Vol. 43, No. 1, pp. 109-117, https://doi.org/10.12652/Ksce.2023.43.1.0109 (in Korean).
  32. Park, J. H., Choi, H. S., Cho, Y., Bang, K. S., Yun, S. H. and Paek, J. H. (2010). "A study on the development of probabilistic duration estimation module using monte carlo simulation." Journal of the Architectural Institute of Korea Structure and Construction, AIK, Vol. 26, No. 5, pp. 101-108 (in Korean).
  33. Raschka, S. (2015). Python machine learning, 1st ed., Packt publishing Ltd.
  34. Saritas, M. M. and Yasar, A. (2019). "Performance analysis of ANN and Naive Bayes classification algorithm for data classification." International Journal of Intelligent Systems and Applications in Engineering, Elsevier, Vol. 7, No. 2, pp. 88-91, https://doi.org/10.18201//ijisae.2019252786.
  35. Singh, D. and Singh, B. (2020). "Investigating the impact of data normalization on classification performance." Applied Soft Computing, Elsevier, Vol. 97, Part B, 105524, https://doi.org/10.1016/j.asoc.2019.105524.
  36. Wang, G., Liu, M., Cao, D. and Tan, D. (2020). "Identifying high-frequency-low-severity construction safety risks: An empirical study based on official supervision reports in Shanghai." Engineering, Construction and Architectural Management, Emerald Publishing Limited, Vol. 29, No. 2, pp. 940-960, https://doi.org/10.1108/ECAM-07-2020-0581.
  37. Wong, T. T. and Yeh, P. Y. (2019). "Reliable accuracy estimates from k-fold cross validation." IEEE Transactions on Knowledge and Data Engineering, IEEE, Vol. 32, No. 8, pp. 1586-1594, https://doi.org/10.1109/TKDE.2019.2912815.
  38. Yang, S. W. and Lim, H. C. (2021). "Semantic network analysis on the research trends of construction accident." Journal of the Architectural Institute of Korea, AIK, Vol. 37, No. 6, pp. 231-236 (in Korean).
  39. Yu, J. H. and Kim, O. K. (2021). "A case study on the prevention of construction delays using the delay management index in program level construction projects." Journal of the Korea Institute of Building Construction, Vol. 21, No. 4, pp. 347-359, https://doi.org/10.5345/JKIBC.2021.21.4.347 (in Korean).
  40. Zhang, F., Fleyeh, H., Wang, X. and Lu, M. (2019). "Construction site accident analysis using text mining and natural language processing techniques." Automation in Construction, Elsevier, Vol. 99, pp. 238-248, https://doi.org/10.1016/j.autcon.2018.12.016.