DOI QR코드

DOI QR Code

Analysis of Leaf Node Ranking Methods for Spatial Event Prediction

의사결정트리에서 공간사건 예측을 위한 리프노드 등급 결정 방법 분석

  • Yeon, Young-Kwang (Geoscience Research Division, Korea Institute of Geoscience and Mineral Resources)
  • 연영광 (한국지질자원연구원 국토지질연구본부)
  • Received : 2014.09.16
  • Accepted : 2014.11.24
  • Published : 2014.12.31

Abstract

Spatial events are predictable using data mining classification algorithms. Decision trees have been used as one of representative classification algorithms. And they were normally used in the classification tasks that have label class values. However since using rule ranking methods, spatial prediction have been applied in the spatial prediction problems. This paper compared rule ranking methods for the spatial prediction application using a decision tree. For the comparison experiment, C4.5 decision tree algorithm, and rule ranking methods such as Laplace, M-estimate and m-branch were implemented. As a spatial prediction case study, landslide which is one of representative spatial event occurs in the natural environment was applied. Among the rule ranking methods, in the results of accuracy evaluation, m-branch showed the better accuracy than other methods. However in case of m-brach and M-estimate required additional time-consuming procedure for searching optimal parameter values. Thus according to the application areas, the methods can be selectively used. The spatial prediction using a decision tree can be used not only for spatial predictions, but also for causal analysis in the specific event occurrence location.

공간사건들은 데이터마이닝 분류알고리즘을 이용하여 예측 가능하며, 의사결정 트리는 대표적인 분류알고리즘들 중 하나로 사용되고 있다. 의사결정 트리는 레이블 값을 갖는 분류작업에 주로 사용되었으나 규칙평가 기법을 트리 리프노드 등급 계산에 응용하면서부터 공간사건 예측에 이용되고 있다. 이 논문에서는 의사결정 트리에서 사용되는 규칙평가 방법들을 공간예측에 적용하여 비교하였다. 실험을 위해 의사결정 트리 알고리즘인 C4.5알고리즘과 규칙 평가기법인 Laplace, M-estimate 및 m-branch 기법들을 구현하여 자연환경에서 발생되는 대표적인 공간예측 응용분야인 산사태에 적용하였다. 적용한 규칙 평가 기법들의 정확도 평가결과, 그 특성에 따라 정확도의 차이가 있었으며 m-branch가 가장 높은 성능을 보였다. 그러나 m-branch 및 M-estimate와 같이 별도의 파라미터를 갖는 경우 반복적으로 최적의 파라미터 값을 찾는 과정을 요구하였다. 따라서 적용 대상에 따라 선택적으로 활용할 수 있다. 이러한 의사결정 트리를 이용한 공간예측은 예측 결과뿐만 아니라 특정 위치에서의 예측결과에 대한 원인분석을 가능하게 함으로 다양한 응용을 가능하게 한다.

Keywords

References

  1. Bonachea, J., J. Remondo, J.R.D. De Teran, A. Gonzalez-Diez and A. Cendrero. 2009. Landslide risk models for decision making. Risk Analysis 29(11):1629-1643. https://doi.org/10.1111/j.1539-6924.2009.01283.x
  2. Brandenburger, T. and A. Furth. 2009. Cumulative gains model quality metric. Journal of Applied Mathematics and Decision Sciences 2009:1-14.
  3. Breiman, L., J.H. Friedman, R.A. Olshen and C.J. Stone. 1984. Classification and Regression Trees, Chapman & Hal, Wadsworth, Inc, New York.
  4. Casale, R., R. Fantechi and J.C. Flageolet. 1994. Temporal occurrence and forecasting of landslides in the European community. Final Report, European Community Programme Epoch. 957pp.
  5. Cestnik, B. 1990. Estimating probabilities: a crucial task in machine learning. Proceedings of 9th European Conference on Artificial Intelligence 1990, pp.147-149.
  6. Clerici, A., S. Perego, C. Tellini and P. Vescovi. 2010. Landslide failure and runout susceptibility in the upper T. Ceno valley(Northern Apennines, Italy). Natural Hazards 52(1):1-29. https://doi.org/10.1007/s11069-009-9349-4
  7. Cussents, J. 1993. Bayes and psudobayes estimates of conditional probabilities and their reliabilities. Proceedings of European Conference on Machine Learning, pp.136-152.
  8. Davis, J.C., C.J. Chung and G.C. Ohlmacher. 2006. Two models for evaluating landslide hazards, Computers & Geosciences 32(8):1120-1127. https://doi.org/10.1016/j.cageo.2006.02.006
  9. Dikau, R., L. Schrott, D. Brunsden and M.L. Ibsen. 1996. Landslide recognition: Identification, Movement and Causes, John Wiley & Sons: Chichester, UK. pp.122-136.
  10. Ferri, C., P.A. Flach and J. Hernandez-Orallo. 2003. Improving the AUC of probabilistic estimation trees. In: N. Lavrac et al.(Eds.) Machine Learning: ECML. Springer Berlin Heidelberg, pp.121-132.
  11. Jimenez-Peralvarez, J.D., C. Irigaray, R. El Hamdouni and J. Chacon. 2009. Building models for automatic landslidesusceptibility analysis, mapping and validation in ArcGIS. Natural Hazards 50(3):571-590. https://doi.org/10.1007/s11069-008-9305-8
  12. Jo, M.H. and Jo, Y.W. 2009. Developing forecast technique of landslide hazard area by integrating meteorological observation data and topographical data -a case study of Uljin area-. Journal of the Korean Association of Geographic Information Studies 12(2): 1-10 (조명희, 조윤원. 2009. 기상과 지형 자료를 통합한 산사태 위험지 예측 기법 개발 -울진지역을 대상으로-. 한국지리정보학회지 12(2):1-10).
  13. Lee, J.D., S.H. Yeon, S.G. Kim and H.C. Lee. 2002. The application of GIS for the prediction of landslide - potential area. Journal of the Korean Association of Geographic Information Studies 5(1):38-47 (이진덕, 연상호, 김성길, 이호찬. 2002. 산사태의 발생가능지 예측을 위한 GIS의 적용. 한국지리정보학회지 5(1):38-47).
  14. Liang, H. and Y. Yan. 2006. Improve decision trees for probability-based ranking by lazy learners. Tools with Artificial Intelligence, 2006. ICTAI'06. 18th IEEE International Conference on IEEE, pp.427-435.
  15. Meusburger, K. and C. Alewell. 2009. On the influence of temporal change on the validity of landslide susceptibility maps. Natural Hazards and Earth System Science 9(4):1495-1507. https://doi.org/10.5194/nhess-9-1495-2009
  16. Pal, M. and P.M. Mather. 2003. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sensing of Environment 86:554-556. https://doi.org/10.1016/S0034-4257(03)00132-9
  17. Park, J.S., K.T. Kim and Y.S. Choi. 2012. Landslide risk assessment using HyGIS-landslide. Journal of the Korean Association of Geographic Information Studies 15(1):119-132 (박정술, 김경탁, 최윤석. 2012. HyGIS-Landslide를 이용한 산사태 발생 위험도 평가. 한국지리정보학회지 15(1):119-132). https://doi.org/10.11108/kagis.2012.15.1.119
  18. Peng, W.F., C.L. Wang, S.T. Chen and S.T. Lee. 2009. Incorporating the effects of topographic amplification and sliding areas in the modeling of earthquake-induced landslide hazards, using the cumulative displacement method. Computers and Geosciences 35(5):946-966. https://doi.org/10.1016/j.cageo.2008.09.007
  19. Provost, F.J. and P. Domingos. 2003. Tree induction for probability-based ranking. Machine Learning 52:199-215. https://doi.org/10.1023/A:1024099825458
  20. Quinlan, J.R. 1993. Programs for Machine Learning. Morgan Kaufmann, 302pp.
  21. Rossi, M., F. Guzzetti, P. Reichenbach, A.C. Mondini and S. Peruccacci. 2010. Optimal landslide susceptibility zonation based on multiple forecasts. Geomorphology 114(3):129-142. https://doi.org/10.1016/j.geomorph.2009.06.020
  22. Su, F. and P. Cui. 2009. GIS-based susceptibility mapping and zonation of debris flows caused by Wenchuan earthquake. Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on IEEE, pp.1-5.
  23. Wu, X., V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A. Ng, B. Liu, P.S. Yu, Z. Zhour, M. Steinbach, D.J. Hand and D. Steinberg. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems 14(1):1-37. https://doi.org/10.1007/s10115-007-0114-2
  24. Yeon, Y.K. 2011. Evaluation and analysis of Gwangwon-do landslide susceptibility using logistic regression. Journal of the Korean Association of Geographic Information Studies 14(4):116-127 (연영광. 2011. 로지스틱 회귀분석 기법을 이용한 강원도 산사태 취약성 평가 및 분석. 한국지리정보학회지 14(4):116-127). https://doi.org/10.11108/kagis.2011.14.4.116
  25. Yeon, Y.K., J.G. Han and K.H. Ryu. 2010. Landslide susceptibility mapping in Injae, Korea, using a decision tree. Engineering Geology 116(3):274-283. https://doi.org/10.1016/j.enggeo.2010.09.009
  26. Zadrozny, B. and C. Elkan. 2001. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. Proceedings of 18th International Conference on Machine Learning. 2001, pp.609-616.