A Feature Analysis of Industrial Accidents Using C4.5 Algorithm

C4.5 알고리즘을 이용한 산업 재해의 특성 분석

  • Leem, Young-Moon (Department of Industrial Systems Engineering, Kangnung National University) ;
  • Kwag, Jun-Koo (Department of Industrial Systems Engineering, Kangnung National University) ;
  • Hwang, Young-Seob (Department of Industrial Systems Engineering, Kangnung National University)
  • 임영문 (강릉대학교 산업시스템공학과) ;
  • 곽준구 (강릉대학교 산업시스템공학과) ;
  • 황영섭 (강릉대학교 산업시스템공학과)
  • Published : 2005.12.31

Abstract

Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.

References

  1. 권혜숙, '데이터마이닝 패키지에서 분류나무 알고리즘의 비교 연구', 서울대학교 석사 학위 논문, 2002
  2. 김경배, 산업재해의 예방대책에 관한 연구, 관동대학교 석사학위논문, 2004
  3. 김종현, 우리나라 산업재해 통계를 이용한 재해 실태분석과 통계제도의 개선방향, 경일대학교 석사학위논문, pp. 40-60, 1998
  4. 노동부, 2004년판 노동백서, pp 56-65, 2004
  5. 오희경, 최형인, '데이터 마이닝 분류 모델 비교 및 분석', 서울대학교 석사 학위 논문, 2002
  6. 임영문, 최영두, '연관규칙을 이용한 데이터 분석에 관한 연구', 산업경영시스템학회지, Vol. 23, No. 61, pp 115-126, 2001
  7. 최종후, 한상태, 강현철, 김은석, '(Answer Tree를 이용한) 데이터 마이닝 의사결정나무 분석', 고려 정보 산업, pp. 17-74, 1998
  8. 최종후, 한상태, 강현철, 김은석, 김차용, 김미경, 'SAS Enterprise Miner를 이용한 데이터 마이닝-방법론 및 활용', 자유아카데미, pp. 230-300, 2001
  9. Chong Yau Fu.; 'Combining Loglinear Model with Classification and Regression Tree (CART) : an Application to Birth Data', Computational Statistics & Data Analysis 4, 2004
  10. Kamber, M., Winstone, L., Gong, W., Cheng, S., Han, J.; 'Generalization and Decision Tree Induction', Efficient Classification in Data Mining. Proceedings of the International Workshop Issue of Data Engineering (RIDE' 97) Birmingham, UK. pp. 111-120, 1997
  11. Pietersma, D., Lacroix, R., Lefebvre, D., Wade, K. M., 'Induction and Evaluation of Decision Tree for Lactation Curve Analysis', Comput. Electron. Agric. 38, pp. 19-32, 2003 https://doi.org/10.1016/S0168-1699(02)00105-9