• Title/Summary/Keyword: Classification tree

Search Result 937, Processing Time 0.028 seconds

A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach (의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.28 no.3
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.

CANCER CLASSIFICATION AND PREDICTION USING MULTIVARIATE ANALYSIS

  • Shon, Ho-Sun;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.706-709
    • /
    • 2006
  • Cancer is one of the major causes of death; however, the survival rate can be increased if discovered at an early stage for timely treatment. According to the statistics of the World Health Organization of 2002, breast cancer was the most prevalent cancer for all cancers occurring in women worldwide, and it account for 16.8% of entire cancers inflicting Korean women today. In order to classify the type of breast cancer whether it is benign or malignant, this study was conducted with the use of the discriminant analysis and the decision tree of data mining with the breast cancer data disclosed on the web. The discriminant analysis is a statistical method to seek certain discriminant criteria and discriminant function to separate the population groups on the basis of observation values obtained from two or more population groups, and use the values obtained to allow the existing observation value to the population group thereto. The decision tree analyzes the record of data collected in the part to show it with the pattern existing in between them, namely, the combination of attribute for the characteristics of each class and make the classification model tree. Through this type of analysis, it may obtain the systematic information on the factors that cause the breast cancer in advance and prevent the risk of recurrence after the surgery.

  • PDF

Classification of Land Cover over the Korean Peninsula Using Polar Orbiting Meteorological Satellite Data (극궤도 기상위성 자료를 이용한 한반도의 지면피복 분류)

  • Suh, Myoung-Seok;Kwak, Chong-Heum;Kim, Hee-Soo;Kim, Maeng-Ki
    • Journal of the Korean earth science society
    • /
    • v.22 no.2
    • /
    • pp.138-146
    • /
    • 2001
  • The land cover over Korean peninsula was classified using a multi-temporal NOAA/AVHRR (Advanced Very High Resolution Radiometer) data. Four types of phenological data derived from the 10-day composited NDVI (Normalized Differences Vegetation Index), maximum and annual mean land surface temperature, and topographical data were used not only reducing the data volume but also increasing the accuracy of classification. Self organizing feature map (SOFM), a kind of neural network technique, was used for the clustering of satellite data. We used a decision tree for the classification of the clusters. When we compared the classification results with the time series of NDVI and some other available ground truth data, the urban, agricultural area, deciduous tree and evergreen tree were clearly classified.

  • PDF

Weather Classification and Fog Detection using Hierarchical Image Tree Model and k-mean Segmentation in Single Outdoor Image (싱글 야외 영상에서 계층적 이미지 트리 모델과 k-평균 세분화를 이용한 날씨 분류와 안개 검출)

  • Park, Ki-Hong
    • Journal of Digital Contents Society
    • /
    • v.18 no.8
    • /
    • pp.1635-1640
    • /
    • 2017
  • In this paper, a hierarchical image tree model for weather classification is defined in a single outdoor image, and a weather classification algorithm using image intensity and k-mean segmentation image is proposed. In the first level of the hierarchical image tree model, the indoor and outdoor images are distinguished. Whether the outdoor image is daytime, night, or sunrise/sunset image is judged using the intensity and the k-means segmentation image at the second level. In the last level, if it is classified as daytime image at the second level, it is finally estimated whether it is sunny or foggy image based on edge map and fog rate. Some experiments are conducted so as to verify the weather classification, and as a result, the proposed method shows that weather features are effectively detected in a given image.

What Characteristics Do Preservice Teachers Show During Trilobite Classification Activities? (예비교사들은 삼엽충 분류활동 중에 어떤 특성을 보이는가?)

  • Lim, Sungman
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.12 no.1
    • /
    • pp.40-53
    • /
    • 2019
  • This study was to analyze the inquiry characteristics of preservice teachers as they classify trilobites. For the study, 70 preservice teachers attending teacher training university participated. The classification tasks used in the study were 9 photos of trilobite fossils. The preservice teachers' inquiry activity was to classify the evolutionary processes of trilobites after observing trilobite fossils by group and then to construct a phylogenetic tree. The results of the study are as follows. First, preservice teachers observed the external features of the trilobites and constructed systematic classification results based on their observed contents. Second, preservice teachers classified trilobites using various classification criteria. Third, the phylogenetic tree of preservice teachers and the phylogenetic tree of scientists were very similar. The preservice teachers constructed a sphylogenetic tree based on the observation and inference of the change from a simple form to a complex form, which is a general evolution process of the trilobite fossil claimed by scientists. These results suggest that group-based inquiry activities with sufficient time are very effective and that the experience of inquiry activities is very important for preservice teachers.

A Study on the Implementation of SQL Primitives for Decision Tree Classification (판단 트리 분류를 위한 SQL 기초 기능의 구현에 관한 연구)

  • An, Hyoung Geun;Koh, Jae Jin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.855-864
    • /
    • 2013
  • Decision tree classification is one of the important problems in data mining fields and data minings have been important tasks in the fields of large database technologies. Therefore the coupling efforts of data mining systems and database systems have led the developments of database primitives supporting data mining functions such as decision tree classification. These primitives consist of the special database operations which support the SQL implementation of decision tree classification algorithms. These primitives have become the consisting modules of database systems for the implementations of the specific algorithms. There are two aspects in the developments of database primitives which support the data mining functions. The first is the identification of database common primitives which support data mining functions by analysis. The other is the provision of the extended mechanism for the implementations of these primitives as an interface of database systems. In data mining, some primitives want be stored in DBMS is one of the difficult problems. In this paper, to solve of the problem, we describe the database primitives which construct and apply the optimized decision tree classifiers. Then we identify the useful operations for various classification algorithms and discuss the implementations of these primitives on the commercial DBMS. We implement these primitives on the commercial DBMS and present experimental results demonstrating the performance comparisons.

Classification Tree Analysis to Assess Contributing Factors Influencing Biosecurity Level on Farrow-to-Finish Pig Farms in Korea (분류 트리 기법을 이용한 국내 일괄사육 양돈장의 차단방역 수준에 영향을 미치는 기여 요인 평가)

  • Kim, Kyu-Wook;Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.33 no.2
    • /
    • pp.107-112
    • /
    • 2016
  • The objective of this study was to determine potential contributing factors associated with biosecurity level of farrow-to-finish pig farms and to develop a classification tree model to explore how these factors related to each other based on prediction model. To this end, the author analyzed data (n = 193) extracted from a cross-sectional study of 344 farrow-to-finish farms which was conducted between March and September 2014 aimed to explore swine disease status at farm level. Standardized questionnaires with information about basic demographical data and management practices were collected in each farm by on-site visit of trained veterinarians. For the classification of the data sets regarding biosecurity level as a dependent variable and predictor variables, Chi-squared Automatic Interaction Detection (CHAID) algorithm was applied for modeling classification tree. The statistics of misclassification risk was used to evaluate the fitness of the model in terms of prediction results. Categorical multivariate input data (40 variables) was used to construct a classification tree, and the target variable was biosecurity level dichotomized into low versus high. In general, the level of biosecurity was lower in the majority of farms studied, mainly due to the limited implementation of on-farm basic biosecurity measures aimed at controlling the potential introduction and transmission of swine diseases. The CHAID model illustrated the relative importance of significant predictors in explaining the level of biosecurity; maintenance of medical records of treatment and vaccination, use of dedicated clothing to enter the farm, installing fence surrounding the farm perimeter, and periodic monitoring of the herd using written biosecurity plan in place. The misclassification risk estimate of the prediction model was 0.145 with the standard error of 0.025, indicating that 85.5% of the cases could be classified correctly by using the decision rule based on the current tree. Although CHAID approach could provide detailed information and insight about interactions among factors associated with biosecurity level, further evaluation of potential bias intervened in the course of data collection should be included in future studies. In addition, there is still need to validate findings through the external dataset with larger sample size to improve the external validity of the current model.

Development of surface defect inspection algorithms for cold mill strip using tree structure (트리 구조를 이용한 냉연 표면흠 검사 알고리듬 개발에 관한 연구)

  • Kim, Kyung-Min;Jung, Woo-Yong;Lee, Byung-Jin;Ryu, Gyung;Park, Gui-Tae
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.365-370
    • /
    • 1997
  • In this paper we suggest a development of surface defect inspection algorithms for cold mill strip using tree structure. The defects which exist in a surface of cold mill strip have a scattering or singular distribution. This paper consists of preprocessing, feature extraction and defect classification. By preprocessing, the binarized defect image is achieved. In this procedure, Top-hit transform, adaptive thresholding, thinning and noise rejection are used. Especially, Top-hit transform using local min/max operation diminishes the effect of bad lighting. In feature extraction, geometric, moment, co-occurrence matrix, histogram-ratio features are calculated. The histogram-ratio feature is taken from the gray-level image. For the defect classification, we suggest a tree structure of which nodes are multilayer neural network clasifiers. The proposed algorithm reduced error rate comparing to one stage structure.

  • PDF

DEVELOPING FOREST TYPE CLASSIFICATION METHODOLOGY USING KOMPSAT IMAGE BASED ON TASSELED CAP TRANSFORMATION

  • Kim, Sung-Jae;Jo, Yun-Won;Jo, Myung-Hee
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.358-360
    • /
    • 2008
  • Recently there are many pilot studies for advanced application of first Korea national high resolution satellite image, which is called as KOMPSAT-MSC (Korean Multi-purpose Satellite-Multi-Spectral Camera), in Korea. In this study the forest type classification methodology is developed and its distribution map was constructed by applying high resolution satellite image, KOMPSAT-MSC, based on Tasseled Cap Transformation, especially through comparing the result of detailed filed surveying such as forest type, tree species, tree diameter, tree age and tree crown density in pilot study area.

  • PDF

Splitting Decision Tree Nodes with Multiple Target Variables (의사결정나무에서 다중 목표변수를 고려한)

  • 김성준
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.243-246
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields Classifying a group into subgroups is one of the most important subjects in data mining Tree-based methods, known as decision trees, provide an efficient way to finding classification models. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variables should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present several methods for measuring the node impurity, which are applicable to data sets with multiple target variables. For illustrations, numerical examples are given with discussion.

  • PDF