• Title/Summary/Keyword: tree-based classification

Search Result 494, Processing Time 0.034 seconds

A GA-based Binary Classification Method for Bankruptcy Prediction (도산예측을 위한 유전 알고리듬 기반 이진분류기법의 개발)

  • Min, Jae-H.;Jeong, Chul-Woo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.33 no.2
    • /
    • pp.1-16
    • /
    • 2008
  • The purpose of this paper is to propose a new binary classification method for predicting corporate failure based on genetic algorithm, and to validate its prediction power through empirical analysis. Establishing virtual companies representing bankrupt companies and non-bankrupt ones respectively, the proposed method measures the similarity between the virtual companies and the subject for prediction, and classifies the subject into either bankrupt or non-bankrupt one. The values of the classification variables of the virtual companies and the weights of the variables are determined by the proper model to maximize the hit ratio of training data set using genetic algorithm. In order to test the validity of the proposed method, we compare its prediction accuracy with ones of other existing methods such as multi-discriminant analysis, logistic regression, decision tree, and artificial neural network, and it is shown that the binary classification method we propose in this paper can serve as a premising alternative to the existing methods for bankruptcy prediction.

TsCNNs-Based Inappropriate Image and Video Detection System for a Social Network

  • Kim, Youngsoo;Kim, Taehong;Yoo, Seong-eun
    • Journal of Information Processing Systems
    • /
    • v.18 no.5
    • /
    • pp.677-687
    • /
    • 2022
  • We propose a detection algorithm based on tree-structured convolutional neural networks (TsCNNs) that finds pornography, propaganda, or other inappropriate content on a social media network. The algorithm sequentially applies the typical convolutional neural network (CNN) algorithm in a tree-like structure to minimize classification errors in similar classes, and thus improves accuracy. We implemented the detection system and conducted experiments on a data set comprised of 6 ordinary classes and 11 inappropriate classes collected from the Korean military social network. Each model of the proposed algorithm was trained, and the performance was then evaluated according to the images and videos identified. Experimental results with 20,005 new images showed that the overall accuracy in image identification achieved a high-performance level of 99.51%, and the effectiveness of the algorithm reduced identification errors by the typical CNN algorithm by 64.87 %. By reducing false alarms in video identification from the domain, the TsCNNs achieved optimal performance of 98.11% when using 10 minutes frame-sampling intervals. This indicates that classification through proper sampling contributes to the reduction of computational burden and false alarms.

The Accuracy Assessment of Species Classification according to Spatial Resolution of Satellite Image Dataset Based on Deep Learning Model (딥러닝 모델 기반 위성영상 데이터세트 공간 해상도에 따른 수종분류 정확도 평가)

  • Park, Jeongmook;Sim, Woodam;Kim, Kyoungmin;Lim, Joongbin;Lee, Jung-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1407-1422
    • /
    • 2022
  • This study was conducted to classify tree species and assess the classification accuracy, using SE-Inception, a classification-based deep learning model. The input images of the dataset used Worldview-3 and GeoEye-1 images, and the size of the input images was divided into 10 × 10 m, 30 × 30 m, and 50 × 50 m to compare and evaluate the accuracy of classification of tree species. The label data was divided into five tree species (Pinus densiflora, Pinus koraiensis, Larix kaempferi, Abies holophylla Maxim. and Quercus) by visually interpreting the divided image, and then labeling was performed manually. The dataset constructed a total of 2,429 images, of which about 85% was used as learning data and about 15% as verification data. As a result of classification using the deep learning model, the overall accuracy of up to 78% was achieved when using the Worldview-3 image, the accuracy of up to 84% when using the GeoEye-1 image, and the classification accuracy was high performance. In particular, Quercus showed high accuracy of more than 85% in F1 regardless of the input image size, but trees with similar spectral characteristics such as Pinus densiflora and Pinus koraiensis had many errors. Therefore, there may be limitations in extracting feature amount only with spectral information of satellite images, and classification accuracy may be improved by using images containing various pattern information such as vegetation index and Gray-Level Co-occurrence Matrix (GLCM).

Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching (Doc2Vec 모형에 기반한 자기소개서 분류 모형 구축 및 실험)

  • Kim, Young Soo;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Information Technology Services
    • /
    • v.19 no.1
    • /
    • pp.103-112
    • /
    • 2020
  • Job seekers are making various efforts to find a good company and companies attempt to recruit good people. Job search activities through self-introduction essay are nowadays one of the most active processes. Companies spend time and cost to reviewing all of the numerous self-introduction essays of job seekers. Job seekers are also worried about the possibility of acceptance of their self-introduction essays by companies. This research builds a classification model and conducted an experiments to classify self-introduction essays into pass or fail using deep learning and decision tree techniques. Real world data were classified using stratified sampling to alleviate the data imbalance problem between passed self-introduction essays and failed essays. Documents were embedded using Doc2Vec method developed from existing Word2Vec, and they were classified using logistic regression analysis. The decision tree model was chosen as a benchmark model, and K-fold cross-validation was conducted for the performance evaluation. As a result of several experiments, the area under curve (AUC) value of PV-DM results better than that of other models of Doc2Vec, i.e., PV-DBOW and Concatenate. Furthmore PV-DM classifies passed essays as well as failed essays, while PV_DBOW can not classify passed essays even though it classifies well failed essays. In addition, the classification performance of the logistic regression model embedded using the PV-DM model is better than the decision tree-based classification model. The implication of the experimental results is that company can reduce the cost of recruiting good d job seekers. In addition, our suggested model can help job candidates for pre-evaluating their self-introduction essays.

The Development of Major Tree Species Classification Model using Different Satellite Images and Machine Learning in Gwangneung Area (이종센서 위성영상과 머신 러닝을 활용한 광릉지역 주요 수종 분류 모델 개발)

  • Lim, Joongbin;Kim, Kyoung-Min;Kim, Myung-Kil
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_2
    • /
    • pp.1037-1052
    • /
    • 2019
  • We had developed in preceding study a classification model for the Korean pine and Larch with an accuracy of 98 percent using Hyperion and Sentinel-2 satellite images, texture information, and geometric information as the first step for tree species mapping in the inaccessible North Korea. Considering a share of major tree species in North Korea, the classification model needs to be expanded as it has a large share of Oak(29.5%), Pine (12.7%), Fir (8.2%), and as well as Larch (17.5%) and Korean pine (5.8%). In order to classify 5 major tree species, national forest type map of South Korea was used to build 11,039 training and 2,330 validation data. Sentinel-2 data was used to derive spectral information, and PlanetScope data was used to generate texture information. Geometric information was built from SRTM DEM data. As a machine learning algorithm, Random forest was used. As a result, the overall accuracy of classification was 80% with 0.80 kappa statistics. Based on the training data and the classification model constructed through this study, we will extend the application to Mt. Baekdu and North and South Goseong areas to confirm the applicability of tree species classification on the Korean Peninsula.

Sasang Constitution Detection Based on Facial Feature Analysis Using Explainable Artificial Intelligence (설명가능한 인공지능을 활용한 안면 특징 분석 기반 사상체질 검출)

  • Jeongkyun Kim;Ilkoo Ahn;Siwoo Lee
    • Journal of Sasang Constitutional Medicine
    • /
    • v.36 no.2
    • /
    • pp.39-48
    • /
    • 2024
  • Objectives The aim was to develop a method for detecting Sasang constitution based on the ratio of facial landmarks and provide an objective and reliable tool for Sasang constitution classification. Methods Facial images, KS-15 scores, and certainty scores were collected from subjects identified by Korean Medicine Data Center. Facial ratio landmarks were detected, yielding 2279 facial ratio features. Tree-based models were trained to classify Sasang constitution, and Shapley Additive Explanations (SHAP) analysis was employed to identify important facial features. Additionally, Body Mass Index (BMI) and personality questionnaire were incorporated as supplementary information to enhance model performance. Results Using the Tree-based models, the accuracy for classifying Taeeum, Soeum, and Soyang constitutions was 81.90%, 90.49%, and 81.90% respectively. SHAP analysis revealed important facial features, while the inclusion of BMI and personality questionnaire improved model performance. This demonstrates that facial ratio-based Sasang constitution analysis yields effective and accurate classification results. Conclusions Facial ratio-based Sasang constitution analysis provides rapid and objective results compared to traditional methods. This approach holds promise for enhancing personalized medicine in Korean traditional medicine.

Individual-based Competition Analysis for Secondary Forest in Northeast China

  • Li, Fengri;Chen, Dongsheng;Lu, Jun
    • Journal of Korean Society of Forest Science
    • /
    • v.97 no.5
    • /
    • pp.501-507
    • /
    • 2008
  • The data of crown width with 4 directions, DBH, tree height, and coordinate for sample trees were collected from 30 permanent sample plots in secondary fore st of the Maoershan Experimental Forestry Farm, Northeast China. In this paper, the competition of individual trees in stand were discussed for secondary forest by using iterative Hegyi competition index and crown overlap index that represented the competitive and cooperative interactions among neighboring trees. Active competitors of subject tree in the competition zone were selected to calculate the iterative competition index. Using the results of crown classification based on the equal crown projection area, a new distance dependent competition index called crown overlap index (COI) was developed for secondary forest. The COI performed well in describing the crown competition rather than crown competition factor (CCF). The individual-based competition index discussed in this paper will provide more precise for developing individual tree growth models for secondary forest and it can also use to adjust the stand structure for spatial optimal management.

An Assessment of a Random Forest Classifier for a Crop Classification Using Airborne Hyperspectral Imagery

  • Jeon, Woohyun;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.141-150
    • /
    • 2018
  • Crop type classification is essential for supporting agricultural decisions and resource monitoring. Remote sensing techniques, especially using hyperspectral imagery, have been effective in agricultural applications. Hyperspectral imagery acquires contiguous and narrow spectral bands in a wide range. However, large dimensionality results in unreliable estimates of classifiers and high computational burdens. Therefore, reducing the dimensionality of hyperspectral imagery is necessary. In this study, the Random Forest (RF) classifier was utilized for dimensionality reduction as well as classification purpose. RF is an ensemble-learning algorithm created based on the Classification and Regression Tree (CART), which has gained attention due to its high classification accuracy and fast processing speed. The RF performance for crop classification with airborne hyperspectral imagery was assessed. The study area was the cultivated area in Chogye-myeon, Habcheon-gun, Gyeongsangnam-do, South Korea, where the main crops are garlic, onion, and wheat. Parameter optimization was conducted to maximize the classification accuracy. Then, the dimensionality reduction was conducted based on RF variable importance. The result shows that using the selected bands presents an excellent classification accuracy without using whole datasets. Moreover, a majority of selected bands are concentrated on visible (VIS) region, especially region related to chlorophyll content. Therefore, it can be inferred that the phenological status after the mature stage influences red-edge spectral reflectance.

Prefix Cuttings for Packet Classification with Fast Updates

  • Han, Weitao;Yi, Peng;Tian, Le
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.4
    • /
    • pp.1442-1462
    • /
    • 2014
  • Packet classification is a key technology of the Internet for routers to classify the arriving packets into different flows according to the predefined rulesets. Previous packet classification algorithms have mainly focused on search speed and memory usage, while overlooking update performance. In this paper, we propose PreCuts, which can drastically improve the update speed. According to the characteristics of IP field, we implement three heuristics to build a 3-layer decision tree. In the first layer, we group the rules with the same highest byte of source and destination IP addresses. For the second layer, we cluster the rules which share the same IP prefix length. Finally, we use the heuristic of information entropy-based bit partition to choose some specific bits of IP prefix to split the ruleset into subsets. The heuristics of PreCuts will not introduce rule duplication and incremental update will not reduce the time and space performance. Using ClassBench, it is shown that compared with BRPS and EffiCuts, the proposed algorithm not only improves the time and space performance, but also greatly increases the update speed.