• Title/Summary/Keyword: Statistical classification

Search Result 1,427, Processing Time 0.029 seconds

Empirical Bayes Posterior Odds Ratio for Heteroscedastic Classification

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.16 no.2
    • /
    • pp.92-101
    • /
    • 1987
  • Our interest is to access in some way teh relative odds or probability that a multivariate observation Z belongs to one of k multivariate normal populations with unequal covariance matrices. We derived the empirical Bayes posterior odds ratio for the classification rule when population parameters are unknown. It is a generalization of the posterior odds ratio suggested by Gelsser (1964). The classification rule does not have complicated distribution theory which a large variety of techniques from the sampling viewpoint have. The proposed posterior odds ratio is compared to the Gelsser's posterior odds ratio through a Monte Carlo study. The results show that the empiricla Bayes posterior odds ratio, in general, performs better than the Gelsser's. Especially, for large dimension of Z and small training sample, the performance is prominent.

  • PDF

Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

  • Cha, Woon Ock;Huh, Moon Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.879-894
    • /
    • 2003
  • We evaluated the efficiencies of applying attribute selection methods and prior discretization to supervised learning, modelled by C4.5 and Naive Bayes. Three databases were obtained from UCI data archive, which consisted of continuous attributes except for one decision attribute. Four methods were used for attribute selection : MDI, ReliefF, Gain Ratio and Consistency-based method. MDI and ReliefF can be used for both continuous and discrete attributes, but the other two methods can be used only for discrete attributes. Discretization was performed using the Fayyad and Irani method. To investigate the effect of noise included in the database, noises were introduced into the data sets up to the extents of 10 or 20%, and then the data, including those either containing the noises or not, were processed through the steps of attribute selection, discretization and classification. The results of this study indicate that classification of the data based on selected attributes yields higher accuracy than in the case of classifying the full data set, and prior discretization does not lower the accuracy.

Face image classification by SVM

  • Park, Hye-Jeong;Sim, Ju-Yong;Kim, Mun-Tae;O, Gwang-Sik;Kim, Dae-Hak
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.155-159
    • /
    • 2003
  • 최근 들어 SVM(support vector machines)은 기계학습의 분야에서 많은 응용이 이루어지고 있으며 특히 분류(classification)나 회귀(regression)분석의 영역에서 많은 연구가 진행중이다. 본 논문에서는 SVM을 이용하여 입력영상자료(image data)를 분류하고자 한다. RGB 컬러 영상자료가 입력되면 이미지 크기에 관계없이 이미지 자체를 입력패턴으로 인식하고 SVM을 통한 훈련(training)을 거친 결과(weight 들과 bias 추정치)를 이용하여 입력영상자료가 사람인가를 분류할 수 있는 문제를 다룬다. 제안된 방법의 타당성은 152개의 영상자료에 적용하여 분석되었다.

  • PDF

Recognize Handwritten Urdu Script Using Kohenen Som Algorithm

  • Khan, Yunus;Nagar, Chetan
    • International Journal of Ocean System Engineering
    • /
    • v.2 no.1
    • /
    • pp.57-61
    • /
    • 2012
  • In this paper we use the Kohonen neural network based Self Organizing Map (SOM) algorithm for Urdu Character Recognition. Kohenen NN have more efficient in terms of performance as compare to other approaches. Classification is used to recognize hand written Urdu character. The number of possible unknown character is reducing by pre-classification with respect to subset of the total character set. So the proposed algorithm is attempt to group similar character. Members of pre-classified group are further analyzed using a statistical classifier for final recognition. A recognition rate of around 79.9% was achieved for the first choice and more than 98.5% for the top three choices. The result of this paper shows that the proposed Kohonen SOM algorithm yields promising output and feasible with other existing techniques.

Comparison of BP and SOM as a Classification of PD Source (부분방전원의 분류에 있어서 BP와 SOM의 비교)

  • 박성희;강성화;임기조
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.17 no.9
    • /
    • pp.1006-1012
    • /
    • 2004
  • In this paper, neural networks is studied to apply as a PD source classification in XLPE power cable specimen. Two learning schemes are used to classification; BP(Back propagation algorithm), SOM(self organized map - kohonen network). As a PD source, using treeing discharge sources in the specimen, three defected models are made. And these data making use of a computer-aided discharge analyser, statistical and other discharge parameters is calculated to discrimination between different models of discharge sources. And a]so these distribution characteristics are applied to classify PD sources by two scheme of the neural networks. In conclusion, recognition efficiency of BP is superior to SOM.

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

A Resetting Scheme for Process Parameters using the Mahalanobis-Taguchi System

  • Park, Chang-Soon
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.4
    • /
    • pp.589-603
    • /
    • 2012
  • Mahalanobis-Taguchi system(MTS) is a statistical tool for classifying the normal group and abnormal group in multivariate data structures. In addition to the classification itself, the MTS uses a method for selecting variables useful for the classification. This method can be used efficiently especially when the abnormal group data are scattered without a specific directionality. When the feedback adjustment procedure through the measurements of the process output for controlling process input variables is not practically possible, the reset procedure can be an alternative one. This article proposes a reset procedure using the MTS. Moreover, a method for identifying input variables to reset is also proposed by the use of the contribution. The identification of the root-cause parameters using the existing dimension-reduced contribution tends to be difficult due to the variety of correlation relationships of multivariate data structures. However, it became possible to provide an improved decision when used together with the location-centered contribution and the individual-parameter contribution.

Comparison of graph clustering methods for analyzing the mathematical subject classification codes

  • Choi, Kwangju;Lee, June-Yub;Kim, Younjin;Lee, Donghwan
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.5
    • /
    • pp.569-578
    • /
    • 2020
  • Various graph clustering methods have been introduced to identify communities in social or biological networks. This paper studies the entropy-based and the Markov chain-based methods in clustering the undirected graph. We examine the performance of two clustering methods with conventional methods based on quality measures of clustering. For the real applications, we collect the mathematical subject classification (MSC) codes of research papers from published mathematical databases and construct the weighted code-to-document matrix for applying graph clustering methods. We pursue to group MSC codes into the same cluster if the corresponding MSC codes appear in many papers simultaneously. We compare the MSC clustering results based on the several assessment measures and conclude that the Markov chain-based method is suitable for clustering the MSC codes.

Land Cover Classification and Analysis using Remotely Sensed Images Landsat TM with SPOT Panchromatic (Landsat TM과 SPOT Panchromatic 인공위성 영상자료를 이용한 토지피복분류 및 분석)

  • 함종화;윤춘경;김성준
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 1999.10c
    • /
    • pp.765-770
    • /
    • 1999
  • The purpose of this study is to obtain land classification map by using remotely sensed data; Landsat TM and SPOT panchromatic, and to compare their results with statistical data and digitized coverage from topographic paper map. The classification was conducted by maximum likelihood method with training sets. The best result was obtained from the Landsat TM merged by SPOT Panchromatic, that is, similar with statistical data. This is caused by setting more precise training sets with the enhanced spatial resolution by using SPOT Panchromatic. The classified map may be useful as a fundamental data to estimate pollutant load in regional scale of agricultural watershed.

  • PDF

A study of constitution diagnosis using decision tree method (의사결정나무법을 이용한 체질진단에 관한 연구)

  • Lee, Yong-Seop;Park, Seong-Sik;Park, Eun-Kyung
    • Journal of Sasang Constitutional Medicine
    • /
    • v.13 no.2
    • /
    • pp.144-155
    • /
    • 2001
  • By the increasing concern about Sasang Constitution Medicine, its practical use is considered very important in disease prevention and medical treatment. However, the method of constitution classification is depending on the doctor's clinical trials because of the lack of the objective test criteria. This study is trying to improve the objectiveness of diagnosis using a new statistical method, decision tree. Decision tree method-a classification technique in the statistical analysis- was used to analyze the result of QSCCII instead of using discriminant analysis. As a result, 16 among 121 QSCCII questions was selected as important questions and 21 terminal nodes was built to classify the constitution. Using only 16 questions shown in the result of decision tree, we can diagnose and interpret the constitution easily and effectively.

  • PDF