• 제목/요약/키워드: Classification rule

검색결과 544건 처리시간 0.026초

다층신경망의 학습능력 향상을 위한 학습과정 및 구조설계 (A multi-layed neural network learning procedure and generating architecture method for improving neural network learning capability)

  • 이대식;이종태
    • 경영과학
    • /
    • 제18권2호
    • /
    • pp.25-38
    • /
    • 2001
  • The well-known back-propagation algorithm for multi-layered neural network has successfully been applied to pattern c1assification problems with remarkable flexibility. Recently. the multi-layered neural network is used as a powerful data mining tool. Nevertheless, in many cases with complex boundary of classification, the successful learning is not guaranteed and the problems of long learning time and local minimum attraction restrict the field application. In this paper, an Improved learning procedure of multi-layered neural network is proposed. The procedure is based on the generalized delta rule but it is particular in the point that the architecture of network is not fixed but enlarged during learning. That is, the number of hidden nodes or hidden layers are increased to help finding the classification boundary and such procedure is controlled by entropy evaluation. The learning speed and the pattern classification performance are analyzed and compared with the back-propagation algorithm.

  • PDF

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

음성을 이용한 사상체질 분류 알고리즘 (Automated Speech Analysis Applied to Sasang Constitution Classification)

  • 강재환;유종향;이혜정;김종열
    • 말소리와 음성과학
    • /
    • 제1권3호
    • /
    • pp.155-163
    • /
    • 2009
  • This paper introduces an automatic voice classification system for the diagnosis of individual constitution based on Sasang Constitutional Medicine (SCM) in Traditional Korean Medicine (TKM). For the developing of this algorithm, we used the voices of 473 speakers and extracted a total of 144 speech features from the speech data consisting of five sustained vowels and one sentence. The classification system, based on a rule-based algorithm that is derived from a non parametric statistical method, presents binary negative decisions. In conclusion, 55.7% of the speech data were diagnosed by this system, of which 72.8% were correct negative decisions.

  • PDF

데이터 분할 평가 진화알고리즘을 이용한 효율적인 퍼지 분류규칙의 생성 (Generation of Efficient Fuzzy Classification Rules Using Evolutionary Algorithm with Data Partition Evaluation)

  • 류정우;김성은;김명원
    • 한국지능시스템학회논문지
    • /
    • 제18권1호
    • /
    • pp.32-40
    • /
    • 2008
  • 데이터 속성 값이 연속적이고 애매할 때 퍼지 규칙으로 분류규칙을 표현하는 것은 매우 유용하면서도 효과적이다. 그러나 효과적인 퍼지 분류규칙을 생성하기 위한 소속함수를 결정하기는 어렵다. 본 논문에서는 진화알고리즘을 이용하여 효과적인 퍼지 분류규칙을 자동으로 생성하는 방법을 제안한다. 제안한 방법은 지도 군집화로 클래스 분포에 따라 초기 소속함수를 생성하고, 정확하고 간결한 규칙을 생성할 수 있도록 초기 소속함수를 진화시키는 방법이다. 또한 진화알고리즘의 시간에 대한 효율성을 높이기 위한 방법으로 데이터 분할 평가 진화 방법을 제안한다. 데이터 분할 평가 진화 방법은 전체 학습 데이터를 여러 개의 부분 학습 데이터들로 나누고 개체는 전체 학습 데이터 대신 부분 학습 데이터를 임의로 선택하여 평가하는 방법이다. UCI 벤치마크 데이터로 기존 방법과 비교 실험을 통해 평균적으로 제안한 방법이 효과적임을 보였다. 또한 KDD'99 Cup의 침입탐지 데이터에서 KDD'99 Cup 우승자에 비해 1.54% 향상된 인식률과 20.8% 절감된 탐지비용을 보였고 데이터 분할 평가 진화 방법으로 개체평가 시간을 약 70% 감소시켰다.

의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용 (Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test)

  • 윤태균;이관수
    • 전기학회논문지
    • /
    • 제57권6호
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

COMPOUNDED METHOD FOR LAND COVERING CLASSIFICATION BASED ON MULTI-RESOLUTION SATELLITE DATA

  • HE WENJU;QIN HUA;SUN WEIDONG
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2005년도 Proceedings of ISRS 2005
    • /
    • pp.116-119
    • /
    • 2005
  • As to the synthetical estimation of land covering parameters or the compounded land covering classification for multi-resolution satellite data, former researches mainly adopted linear or nonlinear regression models to describe the regression relationship of land covering parameters caused by the degradation of spatial resolution, in order to improve the retrieval accuracy of global land covering parameters based on 1;he lower resolution satellite data. However, these methods can't authentically represent the complementary characteristics of spatial resolutions among different satellite data at arithmetic level. To resolve the problem above, a new compounded land covering classification method at arithmetic level for multi-resolution satellite data is proposed in this .paper. Firstly, on the basis of unsupervised clustering analysis of the higher resolution satellite data, the likelihood distribution scatterplot of each cover type is obtained according to multiple-to-single spatial correspondence between the higher and lower resolution satellite data in some local test regions, then Parzen window approach is adopted to derive the real likelihood functions from the scatterplots, and finally the likelihood functions are extended from the local test regions to the full covering area of the lower resolution satellite data and the global covering area of the lower resolution satellite is classified under the maximum likelihood rule. Some experimental results indicate that this proposed compounded method can improve the classification accuracy of large-scale lower resolution satellite data with the support of some local-area higher resolution satellite data.

  • PDF

Rough Set-Based Approach for Automatic Emotion Classification of Music

  • Baniya, Babu Kaji;Lee, Joonwhoan
    • Journal of Information Processing Systems
    • /
    • 제13권2호
    • /
    • pp.400-416
    • /
    • 2017
  • Music emotion is an important component in the field of music information retrieval and computational musicology. This paper proposes an approach for automatic emotion classification, based on rough set (RS) theory. In the proposed approach, four different sets of music features are extracted, representing dynamics, rhythm, spectral, and harmony. From the features, five different statistical parameters are considered as attributes, including up to the $4^{th}$ order central moments of each feature, and covariance components of mutual ones. The large number of attributes is controlled by RS-based approach, in which superfluous features are removed, to obtain indispensable ones. In addition, RS-based approach makes it possible to visualize which attributes play a significant role in the generated rules, and also determine the strength of each rule for classification. The experiments have been performed to find out which audio features and which of the different statistical parameters derived from them are important for emotion classification. Also, the resulting indispensable attributes and the usefulness of covariance components have been discussed. The overall classification accuracy with all statistical parameters has recorded comparatively better than currently existing methods on a pair of datasets.

유전자 알고리즘을 이용한 데이터 마이닝의 분류 시스템에 관한 연구 (Using Genetic Rule-Based Classifier System for Data Mining)

  • 한명묵
    • 인터넷정보학회논문지
    • /
    • 제1권1호
    • /
    • pp.63-72
    • /
    • 2000
  • 데이터마이닝은 방대한 데이터 자료로부터 숨어있는 지식이나 유용한 정보를 추출하는 과정이다. 이러한 데이터 마이닝 알고리즘은 통계학, 전자계산학, 그리고 기계학습 분야에서의 오랜 기간동안 이루어진 연구 결과의 산물이다. 어느 특정한 상황에 적용하는 특정한 기술들의 선택은 구현되어야 하는 데이터 마이닝 임무의 성격과 가용한 데이터의 성격에 의존한다. 데이터 마이닝에는 여러 임무가 있으며, 그 중에서 가장 대표적인 임무가 분류라고 (classification) 볼 수 있다. 분류는 인간 사고의 기본적인 요소이기 때문에 여러 응용 분야에서 많은 연구가 진행되어 왔으며, 문제 분석의 첫 단계라고 볼 수 있다. 본 논문에서는 학습문제에서 강건성(robust)을 갖는 유전자 알고리즘 기반의 분류시스템을 제안하고, 데이터 마이닝에서 중요한 분류기능에 관련된 문제인 nDmC에 응용해서 그 유효성을 검증한다.

  • PDF

K-means 클러스터링을 이용한 자율학습을 통한 잠재적간 질환 환자의 분류를 위한 계층 정의 (Identifying Classes for Classification of Potential Liver Disorder Patients by Unsupervised Learning with K-means Clustering)

  • 김준범;오교중;오근휘;최호진
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2011년도 한국컴퓨터종합학술대회논문집 Vol.38 No.1(C)
    • /
    • pp.195-197
    • /
    • 2011
  • This research deals with an issue of preventive medicine in bioinformatics. We can diagnose liver conditions reasonably well to prevent Liver Cirrhosis by classifying liver disorder patients into fatty liver and high risk groups. The classification proceeds in two steps. Classification rules are first built by clustering five attributes (MCV, ALP, ALT, ASP, and GGT) of blood test dataset provided by the UCI Repository. The clusters can be formed by the K-mean method that analyzes multi dimensional attributes. We analyze the properties of each cluster divided into fatty liver, high risk and normal classes. The classification rules are generated by the analysis. In this paper, we suggest a method to diagnosis and predict liver condition to alcoholic patient according to risk levels using the classification rule from the new results of blood test. The K-mean classifier has been found to be more accurate for the result of blood test and provides the risk of fatty liver to normal liver conditions.

Personalized Specific Premature Contraction Arrhythmia Classification Method Based on QRS Features in Smart Healthcare Environments

  • Cho, Ik-Sung
    • 전기전자학회논문지
    • /
    • 제25권1호
    • /
    • pp.212-217
    • /
    • 2021
  • Premature contraction arrhythmia is the most common disease among arrhythmia and it may cause serious situations such as ventricular fibrillation and ventricular tachycardia. Most of arrhythmia clasification methods have been developed with the primary objective of the high detection performance without taking into account the computational complexity. Also, personalized difference of ECG signal exist, performance degradation occurs because of carrying out diagnosis by general classification rule. Therefore it is necessary to design efficient method that classifies arrhythmia by analyzing the persons's physical condition and decreases computational cost by accurately detecting minimal feature point based on only QRS features. We propose method for personalized specific classification of premature contraction arrhythmia based on QRS features in smart healthcare environments. For this purpose, we detected R wave through the preprocessing method and SOM and selected abnormal signal sets.. Also, we developed algorithm to classify premature contraction arrhythmia using QRS pattern, RR interval, threshold for amplitude of R wave. The performance of R wave detection, Premature ventricular contraction classification is evaluated by using of MIT-BIH arrhythmia database that included over 30 PVC(Premature Ventricular Contraction) and PAC(Premature Atrial Contraction). The achieved scores indicate the average of 98.24% in R wave detection and the rate of 97.31% in Premature ventricular contraction classification.