• Title/Summary/Keyword: Classification Tree Method

Search Result 360, Processing Time 0.021 seconds

A Combinatorial Optimization for Influential Factor Analysis: a Case Study of Political Preference in Korea

  • Yun, Sung Bum;Yoon, Sanghyun;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.5
    • /
    • pp.415-422
    • /
    • 2017
  • Finding influential factors from given clustering result is a typical data science problem. Genetic Algorithm based method is proposed to derive influential factors and its performance is compared with two conventional methods, Classification and Regression Tree (CART) and Chi-Squared Automatic Interaction Detection (CHAID), by using Dunn's index measure. To extract the influential factors of preference towards political parties in South Korea, the vote result of $18^{th}$ presidential election and 'Demographic', 'Health and Welfare', 'Economic' and 'Business' related data were used. Based on the analysis, reverse engineering was implemented. Implementation of reverse engineering based approach for influential factor analysis can provide new set of influential variables which can present new insight towards the data mining field.

Development of Traffic Accident Models in Seoul Considering Land Use Characteristics (토지이용특성을 고려한 서울시 교통사고 발생 모형 개발)

  • Lim, Samjin;Park, Juntae
    • Journal of the Society of Disaster Information
    • /
    • v.9 no.1
    • /
    • pp.30-49
    • /
    • 2013
  • In this research we developed a new traffic accident forecasting model on the basis of land use. A new traffic accident forecasting model by type was developed based on market segmentation and further introduction of variables that may reflect characteristics of various regions using Classification and Regression Tree Method. From the results of analysis, activities variables such as the registered population, commuters as well as road size, traffic accidents causing facilities being the subjects of activities were derived as variables explaining traffic accidents.

Bias Reduction in Split Variable Selection in C4.5

  • Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.627-635
    • /
    • 2003
  • In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.

Method for Assessing Landslide Susceptibility Using SMOTE and Classification Algorithms (SMOTE와 분류 기법을 활용한 산사태 위험 지역 결정 방법)

  • Yoon, Hyung-Koo
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.6
    • /
    • pp.5-12
    • /
    • 2023
  • Proactive assessment of landslide susceptibility is necessary for minimizing casualties. This study proposes a methodology for classifying the landslide safety factor using a classification algorithm based on machine learning techniques. The high-risk area model is adopted to perform the classification and eight geotechnical parameters are adopted as inputs. Four classification algorithms-namely decision tree, k-nearest neighbor, logistic regression, and random forest-are employed for comparing classification accuracy for the safety factors ranging between 1.2 and 2.0. Notably, a high accuracy is demonstrated in the safety factor range of 1.2~1.7, but a relatively low accuracy is obtained in the range of 1.8~2.0. To overcome this issue, the synthetic minority over-sampling technique (SMOTE) is adopted to generate additional data. The application of SMOTE improves the average accuracy by ~250% in the safety factor range of 1.8~2.0. The results demonstrate that SMOTE algorithm improves the accuracy of classification algorithms when applied to geotechnical data.

PREPARATION OF CARBON DIOXIDE ABSORPTION MAP USING KOMPSAT-2 IMAGERY

  • Kim, So-Ra;Lee, Woo-Kyun
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.200-203
    • /
    • 2008
  • The objective of this study is to produce the $CO_2$ (carbon dioxide) absorption map using KOMPSAT-2 imagery. For estimating the amount of $CO_2$ absorption, the stand biomass of forest was estimated with the total weight, which was the sum of individual tree weight. Individual tree volumes could be estimated by the crown width extracted from KOMPSAT-2 imagery. In particular, the carbon conversion index and the ratio of the $CO_2$ molecular weight to the C atomic weight, reported in the IPCC (Intergovernmental Panel on Climate Change) guideline, was used to convert the stand biomass into the amount of $CO_2$ absorption. Thereafter, the KOMPSAT-2 imagery was classified with the SBC (segment based classification) method in order to quantify $CO_2$ absorption by tree species. As a result, the map of $CO_2$ absorption was produced and the amount of $CO_2$ absorption was estimated by tree species.

  • PDF

A Study on the Prediction of Community Smart Pension Intention Based on Decision Tree Algorithm

  • Liu, Lijuan;Min, Byung-Won
    • International Journal of Contents
    • /
    • v.17 no.4
    • /
    • pp.79-90
    • /
    • 2021
  • With the deepening of population aging, pension has become an urgent problem in most countries. Community smart pension can effectively resolve the problem of traditional pension, as well as meet the personalized and multi-level needs of the elderly. To predict the pension intention of the elderly in the community more accurately, this paper uses the decision tree classification method to classify the pension data. After missing value processing, normalization, discretization and data specification, the discretized sample data set is obtained. Then, by comparing the information gain and information gain rate of sample data features, the feature ranking is determined, and the C4.5 decision tree model is established. The model performs well in accuracy, precision, recall, AUC and other indicators under the condition of 10-fold cross-validation, and the precision was 89.5%, which can provide the certain basis for government decision-making.

Effect of Prior Probabilities on the Classification Accuracy under the Condition of Poor Separability

  • Kim, Chang-Jae;Eo, Yang-Dam;Lee, Byoung-Kil
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.26 no.4
    • /
    • pp.333-340
    • /
    • 2008
  • This paper shows that the use of prior probabilities of the involved classes improve the accuracy of classification in case of poor separability between classes. Three cases of experiments are designed with two LiDAR datasets while considering three different classes (building, tree, and flat grass area). Moreover, random sampling method with human interpretation is used to achieve the approximate prior probabilities in this research. Based on the experimental results, Bayesian classification with the appropriate prior probability makes the improved classification results comparing with the case of non-prior probability when the ratio of prior probability of one class to that of the other is significantly different to 1.0.

A Study on Pre-evaluation of Tree Species Classification Possibility of CAS500-4 Using RapidEye Satellite Imageries (농림위성 활용 수종분류 가능성 평가를 위한 래피드아이 영상 기반 시험 분석)

  • Kwon, Soo-Kyung;Kim, Kyoung-Min;Lim, Joongbin
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.291-304
    • /
    • 2021
  • Updating a forest type map is essential for sustainable forest resource management and monitoring to cope with climate change and various environmental problems. According to the necessity of efficient and wide-area forestry remote sensing, CAS500-4 (Compact Advanced Satellite 500-4; The agriculture and forestry satellite) project has been confirmed and scheduled for launch in 2023. Before launching and utilizing CAS500-4, this study aimed to pre-evaluation the possibility of satellite-based tree species classification using RapidEye, which has similar specifications to the CAS500-4. In this study, the study area was the Chuncheon forest management complex, Gangwon-do. The spectral information was extracted from the growing season image. And the GLCM texture information was derived from the growing and non-growing seasons NIR bands. Both information were used to classification with random forest machine learning method. In this study, tree species were classified into nine classes to the coniferous tree (Korean red pine, Korean pine, Japanese larch), broad-leaved trees (Mongolian oak, Oriental cork oak, East Asian white birch, Korean Castanea, and other broad-leaved trees), and mixed forest. Finally, the classification accuracy was calculated by comparing the forest type map and classification results. As a result, the accuracy was 39.41% when only spectral information was used and 69.29% when both spectral information and texture information was used. For future study, the applicability of the CAS500-4 will be improved by substituting additional variables that more effectively reflect vegetation's ecological characteristics.

A method of searching the optimum performance of a classifier by testing only the significant events (중요한 이벤트만을 검색함으로써 분류기의 최적 성능을 찾는 방법)

  • Kim, Dong-Hui;Lee, Won Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1275-1282
    • /
    • 2014
  • Too much information exists in ubiquitous environment, and therefore it is not easy to obtain the appropriately classified information from the available data set. Decision tree algorithm is useful in the field of data mining or machine learning system, as it is fast and deduces good result on the problem of classification. Sometimes, however, a decision tree may have leaf nodes which consist of only a few or noise data. The decisions made by those weak leaves will not be effective and therefore should be excluded in the decision process. This paper proposes a method using a classifier, UChoo, for solving a classification problem, and suggests an effective method of decision process involving only the important leaves and thereby excluding the noisy leaves. The experiment shows that this method is effective and reduces the erroneous decisions and can be applied when only important decisions should be made.

A Novel Feature Selection Method for Output Coding based Multiclass SVM (출력 코딩 기반 다중 클래스 서포트 벡터 머신을 위한 특징 선택 기법)

  • Lee, Youngjoo;Lee, Jeongjin
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.7
    • /
    • pp.795-801
    • /
    • 2013
  • Recently, support vector machine has been widely used in various application fields due to its superiority of classification performance comparing with decision tree and neural network. Since support vector machine is basically designed for the binary classification problem, output coding method to analyze the classification result of multiclass binary classifier is used for the application of support vector machine into the multiclass problem. However, previous feature selection method for output coding based support vector machine found the features to improve the overall classification accuracy instead of improving each classification accuracy of each classifier. In this paper, we propose the novel feature selection method to find the features for maximizing the classification accuracy of each binary classifier in output coding based support vector machine. Experimental result showed that proposed method significantly improved the classification accuracy comparing with previous feature selection method.