• 제목/요약/키워드: tree classification method

Search Result 361, Processing Time 0.021 seconds

Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set (대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개)

  • Lim, Yong-B.;Cho, J.;Um, Kyung-A;Lee, Sun-Ah
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.2
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

Hierarchy analysis of computationally proposed 100 cases of new digital games based on the expected marketability (컴퓨테이셔널 방법론에 따라 제안된 100가지 미개발 게임 유형들에 대한 기대 시장성 기준의 위계 분석)

  • Kim, Ikhwan
    • Journal of Korea Game Society
    • /
    • v.19 no.5
    • /
    • pp.133-142
    • /
    • 2019
  • In this study, 100 types of computationally proposed digital games were analyzed based on the expected marketability. The game classification methodology with five classification criteria proposed by Kim (2017) and the elimination method leveraged by the Decision Tree have been adopted as the methodology of the study. As a result, digital games could be classified into three groups. With the result, designers in the field will be able to leverage computational design methodology to develop a new type of digital game more efficiently by following the proposed hierarchy.

Sasang Constitution Detection Based on Facial Feature Analysis Using Explainable Artificial Intelligence (설명가능한 인공지능을 활용한 안면 특징 분석 기반 사상체질 검출)

  • Jeongkyun Kim;Ilkoo Ahn;Siwoo Lee
    • Journal of Sasang Constitutional Medicine
    • /
    • v.36 no.2
    • /
    • pp.39-48
    • /
    • 2024
  • Objectives The aim was to develop a method for detecting Sasang constitution based on the ratio of facial landmarks and provide an objective and reliable tool for Sasang constitution classification. Methods Facial images, KS-15 scores, and certainty scores were collected from subjects identified by Korean Medicine Data Center. Facial ratio landmarks were detected, yielding 2279 facial ratio features. Tree-based models were trained to classify Sasang constitution, and Shapley Additive Explanations (SHAP) analysis was employed to identify important facial features. Additionally, Body Mass Index (BMI) and personality questionnaire were incorporated as supplementary information to enhance model performance. Results Using the Tree-based models, the accuracy for classifying Taeeum, Soeum, and Soyang constitutions was 81.90%, 90.49%, and 81.90% respectively. SHAP analysis revealed important facial features, while the inclusion of BMI and personality questionnaire improved model performance. This demonstrates that facial ratio-based Sasang constitution analysis yields effective and accurate classification results. Conclusions Facial ratio-based Sasang constitution analysis provides rapid and objective results compared to traditional methods. This approach holds promise for enhancing personalized medicine in Korean traditional medicine.

Context-adaptive Smoothing for Speech Synthesis (음성 합성기를 위한 문맥 적응 스무딩 필터의 구현)

  • 이기승;김정수;이재원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.285-292
    • /
    • 2002
  • One of the problems that should be solved in Text-To-Speech (TTS) is discontinuities at unit-joining points. To cope with this problem, a smoothing method using a low-pass filter is employed in this paper, In the proposed soothing method, a filter coefficient that controls the amount of smoothing is determined according to contort information to be synthesized. This method efficiently reduces both discontinuities at unit-joining points and artifacts caused by undesired smoothing. The amount of smoothing is determined with discontinuities around unit-joins points in the current synthesized speech and discontinuities predicted from context. The discontinuity predictor is implemented by CART that has context feature variables. To evaluate the performance of the proposed method, a corpus-based concatenative TTS was used as a baseline system. More than 6075 of listeners realized that the quality of the synthesized speech through the proposed smoothing is superior to that of non-smoothing synthesized speech in both naturalness and intelligibility.

An Incremental Web Document Clustering Based on the Transitive Closure Tree (이행적 폐쇄트리를 기반으로 한 점증적 웹 문서 클러스터링)

  • Youn Sung-Dae;Ko Suc-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.1
    • /
    • pp.1-10
    • /
    • 2006
  • In document clustering methods, the k-means algorithm and the Hierarchical Alglomerative Clustering(HAC) are often used. The k-means algorithm has the advantage of a processing time and HAC has also the advantage of a precision of classification. But both methods have mutual drawbacks, a slow processing time and a low quality of classification for the k-means algorithm and the HAC, respectively. Also both methods have the serious problem which is to compute a document similarity whenever new document is inserted into a cluster. A main property of web resource is to accumulate an information by adding new documents frequently. Therefore, we propose a new method of transitive closure tree based on the HAC method which can improve a processing time for a document clustering, and also propose a superior incremental clustering method for an insertion of a new document and a deletion of a document contained in a cluster. The proposed method is compared with those existing algorithms on the basis of a pre챠sion, a recall, a F-Measure, and a processing time and we present the experimental results.

  • PDF

Analysis and Detection Method for Line-shaped Echoes using Support Vector Machine (Support Vector Machine을 이용한 선에코 특성 분석 및 탐지 방법)

  • Lee, Hansoo;Kim, Eun Kyeong;Kim, Sungshin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.665-670
    • /
    • 2014
  • A SVM is a kind of binary classifier in order to find optimal hyperplane which separates training data into two groups. Due to its remarkable performance, the SVM is applied in various fields such as inductive inference, binary classification or making predictions. Also it is a representative black box model; there are plenty of actively discussed researches about analyzing trained SVM classifier. This paper conducts a study on a method that is automatically detecting the line-shaped echoes, sun strobe echo and radial interference echo, using the SVM algorithm because the line-shaped echoes appear relatively often and disturb weather forecasting process. Using a spatial clustering method and corrected reflectivity data in the weather radar, the training data is made up with mean reflectivity, size, appearance, centroid altitude and so forth. With actual occurrence cases of the line-shaped echoes, the trained SVM classifier is verified, and analyzed its characteristics using the decision tree method.

RFA: Recursive Feature Addition Algorithm for Machine Learning-Based Malware Classification

  • Byeon, Ji-Yun;Kim, Dae-Ho;Kim, Hee-Chul;Choi, Sang-Yong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.61-68
    • /
    • 2021
  • Recently, various technologies that use machine learning to classify malicious code have been studied. In order to enhance the effectiveness of machine learning, it is most important to extract properties to identify malicious codes and normal binaries. In this paper, we propose a feature extraction method for use in machine learning using recursive methods. The proposed method selects the final feature using recursive methods for individual features to maximize the performance of machine learning. In detail, we use the method of extracting the best performing features among individual feature at each stage, and then combining the extracted features. We extract features with the proposed method and apply them to machine learning algorithms such as Decision Tree, SVM, Random Forest, and KNN, to validate that machine learning performance improves as the steps continue.

Application of Regression Tree Model for the Estimation of Groundwater Use at the Agricultural (Dry-field Farming and Rice Farming) Purpose Wells (농업용(전작 및 답작용) 지하수 이용량 추정을 위한 회귀나무 모형의 적용)

  • Kim, yoo-Bum;Hwang, Chan-Ik
    • The Journal of Engineering Geology
    • /
    • v.29 no.4
    • /
    • pp.417-425
    • /
    • 2019
  • Agricultural groundwater use accounts for 51.8% of total groundwater use, so accurate estimation of groundwater use is important for efficient groundwater management. The purpose of this study is to develop a method for estimating the groundwater use of agricultural (rice farming and dry-field farming) wells using regression tree model based on the measured data of 370 wells. Three input variables of the model were evaluated as being significant: well depth, pipe diameter, and pump capacity, and the importance of each variable was 75% for well depth, 17% for pipe diameter, and 8% for pumping capacity. The daily usage of agricultural (rice farming and dry-field farming) wells by the regression tree model was estimated to be very similar to the actual usage, compared to the previous estimation method proposed by the Ministry of Construction and Transportation. In the future, it is expected that the reliability of the usage statistics will be improved if additional observed data is secured and this classification method is modified.

Grid Resource Selection System Using Decision Tree Method (의사결정 트리 기법을 이용한 그리드 자원선택 시스템)

  • Noh, Chang-Hyeon;Cho, Kyu-Cheol;Ma, Yong-Beom;Lee, Jong-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.1
    • /
    • pp.1-10
    • /
    • 2008
  • In order to high-performance data Processing, effective resource selection is needed since grid resources are composed of heterogeneous networks and OS systems in the grid environment. In this paper. we classify grid resources with data properties and user requirements for resource selection using a decision tree method. Our resource selection method can provide suitable resource selection methodology using classification with a decision tree to grid users. This paper evaluates our grid system performance with throughput. utilization, job loss, and average of turn-around time and shows experiment results of our resource selection model in comparison with those of existing resource selection models such as Condor-G and Nimrod-G. These experiment results showed that our resource selection model provides a vision of efficient grid resource selection methodology.

  • PDF

Rural Land Cover Classification using Multispectral Image and LIDAR Data (디중분광영상과 LIDAR자료를 이용한 농업지역 토지피복 분류)

  • Jang Jae-Dong
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.2
    • /
    • pp.101-110
    • /
    • 2006
  • The accuracy of rural land cover using airborne multispectral images and LEAR (Light Detection And Ranging) data was analyzed. Multispectral image consists of three bands in green, red and near infrared. Intensity image was derived from the first returns of LIDAR, and vegetation height image was calculated by difference between elevation of the first returns and DEM (Digital Elevation Model) derived from the last returns of LIDAR. Using maximum likelihood classification method, three bands of multispectral images, LIDAR vegetation height image, and intensity image were employed for land cover classification. Overall accuracy of classification using all the five images was improved to 85.6% about 10% higher than that using only the three bands of multispectral images. The classification accuracy of rural land cover map using multispectral images and LIDAR images, was improved with clear difference between heights of different crops and between heights of crop and tree by LIDAR data and use of LIDAR intensity for land cover classification.