• Title/Summary/Keyword: 분류트리

Search Result 436, Processing Time 0.027 seconds

Combining Multiple Neural Networks by Dempster's Rule of Combination for ARMA Model Identification (Dempster's Rule of Combination을 이용한 인공신경망간의 결합에 의한 ARMA 모형화)

  • Oh, Sang-Bong
    • Journal of Information Technology Application
    • /
    • v.1 no.3_4
    • /
    • pp.69-90
    • /
    • 1999
  • 본 논문은 시계열자료의 ARMA 모형화를 위해 계층적(Hierarchical) 문제해결 방식인 인공신경망 기초 의상결정트리분류기상의 인공신경망 구조를 개선하여 지역문제(Local Problem)를 해결하는 복수개의 인공신경망 결과를 Dempster's rule of combination을 이용하여 종합하는 병행적인 (Parallel) ARMA 모형활르 위한 방법론을 제시함으로써 의사결정트리분류기에 근거한 방법론의 단점을 보완하였다. 본 논문에서 제시한 ARMA 모형화를 위한 방법론은 세 단계로 구성되어 있다: 1) ESACF 특성 벡터 추출단계; 2) 개별 인공신경망에 의한 부분적 모델링 단계; 3) Conflict Resolution 단계, 제시한 방법론을 검증하기 위해 모의실험용 자료와 실제 시계열자료를 이용하여 제시된 방법론을 검증하였으며 실험결과 기존 연구에 비해 ARMA 모형화와 정확도가 높은 것으로 나타났다.

  • PDF

A Study on Smoker Prediction Using Machine Learning Algorithm (기계학습 알고리즘을 이용한 흡연자 예측 연구)

  • Jongwoo Baek;Joonil Bang;Joowon Lee;Hwajong Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.537-538
    • /
    • 2023
  • 본 논문에서는 사람에게서 나타나는 생체 특성과 흡연여부의 상관관계 분석을 위해 랜덤 포레스트와 그래디언트 부스팅 트리의 두 가지 기계학습 알고리즘을 사용하였다. 연구에 사용된 데이터는 국민건강보험공단에서 제공하고 Kaggle에서 취합하여 정리한 건강검진 정보를 사용하였다. 분류 모델의 학습에 있어 혈청 정보가 높은 관계성을 보일 것으로 예상하였으나, 실제 결과는 성별이 가장 큰 영향을 끼치는 것으로 확인되었다.

  • PDF

Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia (한국어 위키피디아를 이용한 분류체계 생성과 개체명 사전 자동 구축)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.492-496
    • /
    • 2010
  • Wikipedia as an open encyclopedia contains immense human knowledge written by thousands of volunteer editors and its reliability is also high. In this paper, we propose to automatically construct a Korean named entity dictionary using the several features of the Wikipedia. Firstly, we generate class hierarchies using the class information from each article of Wikipedia. Secondly, the titles of each article are mapped to our class hierarchies, and then we calculate the entropy value of the root node in each class hierarchy. Finally, we construct named entity dictionary with high performance by removing the class hierarchies which have a higher entropy value than threshold. Our experiment results achieved overall F1-measure of 81.12% (precision : 83.94%, recall : 78.48%).

Web Document Classification Based on Hangeul Morpheme and Keyword Analyses (한글 형태소 및 키워드 분석에 기반한 웹 문서 분류)

  • Park, Dan-Ho;Choi, Won-Sik;Kim, Hong-Jo;Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.19D no.4
    • /
    • pp.263-270
    • /
    • 2012
  • With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

ECG-based Biometric Authentication Using Random Forest (랜덤 포레스트를 이용한 심전도 기반 생체 인증)

  • Kim, JeongKyun;Lee, Kang Bok;Hong, Sang Gi
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.6
    • /
    • pp.100-105
    • /
    • 2017
  • This work presents an ECG biometric recognition system for the purpose of biometric authentication. ECG biometric approaches are divided into two major categories, fiducial-based and non-fiducial-based methods. This paper proposes a new non-fiducial framework using discrete cosine transform and a Random Forest classifier. When using DCT, most of the signal information tends to be concentrated in a few low-frequency components. In order to apply feature vector of Random Forest, DCT feature vectors of ECG heartbeats are constructed by using the first 40 DCT coefficients. RF is based on the computation of a large number of decision trees. It is relatively fast, robust and inherently suitable for multi-class problems. Furthermore, it trade-off threshold between admission and rejection of ID inside RF classifier. As a result, proposed method offers 99.9% recognition rates when tested on MIT-BIH NSRDB.

Predictive Analysis of Problematic Smartphone Use by Machine Learning Technique

  • Kim, Yu Jeong;Lee, Dong Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.213-219
    • /
    • 2020
  • In this paper, we propose a classification analysis method for diagnosing and predicting problematic smartphone use in order to provide policy data on problematic smartphone use, which is getting worse year after year. Attempts have been made to identify key variables that affect the study. For this purpose, the classification rates of Decision Tree, Random Forest, and Support Vector Machine among machine learning analysis methods, which are artificial intelligence methods, were compared. The data were from 25,465 people who responded to the '2018 Problematic Smartphone Use Survey' provided by the Korea Information Society Agency and analyzed using the R statistical package (ver. 3.6.2). As a result, the three classification techniques showed similar classification rates, and there was no problem of overfitting the model. The classification rate of the Support Vector Machine was the highest among the three classification methods, followed by Decision Tree and Random Forest. The top three variables affecting the classification rate among smartphone use types were Life Service type, Information Seeking type, and Leisure Activity Seeking type.

An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest (랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.57-77
    • /
    • 2019
  • Random Forest (RF), a representative ensemble technique, was applied to automatic classification of journal articles in the field of library and information science. Especially, I performed various experiments on the main factors such as tree number, feature selection, and learning set size in terms of classification performance that automatically assigns class labels to domestic journals. Through this, I explored ways to optimize the performance of random forests (RF) for imbalanced datasets in real environments. Consequently, for the automatic classification of domestic journal articles, Random Forest (RF) can be expected to have the best classification performance when using tree number interval 100~1000(C), small feature set (10%) based on chi-square statistic (CHI), and most learning sets (9-10 years).

A Study on Classification Support Expert System Design based on Note Analysis for DDC 20 Tables (DDC 20판의 주기 분석에 근거한 보조표 분류지원 전문가시스템 설계에 관한 연구)

  • 김상미;남태우
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1994.12a
    • /
    • pp.129-132
    • /
    • 1994
  • DOC 20판에서는 보조표 활용을 위하여 다양한 형태의 주기(Note)가 여러 곳에 마련되어 있다. 이 주기는 새로운 학문들이 이전판의 분류체재와의 중복성을 극복하고, 정확한 문헌분류를 위한 중요한 문법규칙들을 포함하고 있다. 그러나. 기술된 주기의 다양성이 제대로 정리되어 있지 않아서 이 주기의 활용은 미흡한 실정이다. 따라서, 본 연구는 DDC 20판의 보조표 T1(표준세분표: Standard Subdivisions) 및 T2(지리, 시대, 인물 구분표: Geographic Areas, Historical Periods, Persons)에 대안 이용주기를 통계적 빈도수를 고려하여 분석하고, 분석된 주기를 유형별로 분류하여 각 유형별 분류기호 생성 문법을 마련하였으며, 분류기호 생성 문법을 유도트리(Derivation tree)를 활용하여 정확한 분류과정을 예시하고, 이를 자동분류시스템으로 활용할 수 있는 분류진원 전문가시스템 모형을 설계하였다.

  • PDF

Tire Tread Pattern Classification Using Fuzzy Clustering Algorithm (퍼지 클러스터링 알고리즘을 이용한 타이어 접지면 패턴의 분류)

  • 강윤관;정순원;배상욱;김진헌;박귀태
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.5 no.2
    • /
    • pp.44-57
    • /
    • 1995
  • In this paper GFI (Generalized Fuzzy Isodata) and FI (Fuzzy Isodata) algorithms are studied and applied to the tire tread pattern classification problem. GFI algorithm which repeatedly grouping the partitioned cluster depending on the fuzzy partition matrix is general form of GI algorithm. In the constructing the binary tree using GFI algorithm cluster validity, namely, whether partitioned cluster is feasible or not is checked and construction of the binary tree is obtained by FDH clustering algorithm. These algorithms show the good performance in selecting the prototypes of each patterns and classifying patterns. Directions of edge in the preprocessed image of tire tread pattern are selected as features of pattern. These features are thought to have useful information which well represents the characteristics of patterns.

  • PDF

The Classification of Metabolic Type Using Tissue Mineral Analysis (모발분석 결과를 이용한 대사형의 분류)

  • 한근식
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.2
    • /
    • pp.197-204
    • /
    • 2004
  • We have sent the 1000 hair samples to USA and received the results from USA because the programs for the interpretation of Tissue Mineral Analysis (TMA) result is not opened yet in Korea. Therefore, the study will analyze the relationship between the hair for Korean and minerals and make a classification system. To achieve the goal, first of all, we coded the results of patients and classified the analyzed results which are the interrelationship between the minerals and dietary situation and the heavy metals and diseases through the statistical methods. Finally we classified 8 metabolic type using decision tree classifier.