• Title/Summary/Keyword: Statistical classification

Search Result 1,432, Processing Time 0.029 seconds

Comparison Study of Multi-class Classification Methods

  • Bae, Wha-Soo;Jeon, Gab-Dong;Seok, Kyung-Ha
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.2
    • /
    • pp.377-388
    • /
    • 2007
  • As one of multi-class classification methods, ECOC (Error Correcting Output Coding) method is known to have low classification error rate. This paper aims at suggesting effective multi-class classification method (1) by comparing various encoding methods and decoding methods in ECOC method and (2) by comparing ECOC method and direct classification method. Both SVM (Support Vector Machine) and logistic regression model were used as binary classifiers in comparison.

Evaluation of the classification method using ancestry SNP markers for ethnic group

  • Lee, Hyo Jung;Hong, Sun Pyo;Lee, Soong Deok;Rhee, Hwan seok;Lee, Ji Hyun;Jeong, Su Jin;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2019
  • Various probabilistic methods have been proposed for using interpopulation allele frequency differences to infer the ethnic group of a DNA specimen. The selection of the statistical method is critical because the accuracy of the statistical classification results vary. For the ancestry classification, we proposed a new ancestry evaluation method that estimate the combined ethnicity index as well as compared its performance with various classical classification methods using two real data sets. We selected 13 SNPs that are useful for the inference of ethnic origin. These single nucleotide polymorphisms (SNPs) were analyzed by restriction fragment mass polymorphism assay and followed by classification among ethnic groups. We genotyped 400 individuals from four ethnic groups (100 African-American, 100 Caucasian, 100 Korean, and 100 Mexican-American) for 13 SNPs and allele frequencies that differed among the four ethnic groups. Additionally, we applied our new method to HapMap SNP genotypes for 1,011 samples from 4 populations (African, European, East Asian, and Central-South Asian). Our proposed method yielded the highest accuracy among statistical classification methods. Our ethnic group classification system based on the analysis of ancestry informative SNP markers can provide a useful statistical tool to identify ethnic groups.

Bivariate ROC Curve and Optimal Classification Function

  • Hong, C.S.;Jeong, J.A.
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.4
    • /
    • pp.629-638
    • /
    • 2012
  • We propose some methods to obtain optimal thresholds and classification functions by using various cutoff criterion based on the bivariate ROC curve that represents bivariate cumulative distribution functions. The false positive rate and false negative rate are calculated with these classification functions for bivariate normal distributions.

Classification via principal differential analysis

  • Jang, Eunseong;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.135-150
    • /
    • 2021
  • We propose principal differential analysis based classification methods. Computations of squared multiple correlation function (RSQ) and principal differential analysis (PDA) scores are reviewed; in addition, we combine principal differential analysis results with the logistic regression for binary classification. In the numerical study, we compare the principal differential analysis based classification methods with functional principal component analysis based classification. Various scenarios are considered in a simulation study, and principal differential analysis based classification methods classify the functional data well. Gene expression data is considered for real data analysis. We observe that the PDA score based method also performs well.

A Note on Fuzzy Support Vector Classification

  • Lee, Sung-Ho;Hong, Dug-Hun
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.133-140
    • /
    • 2007
  • The support vector machine has been well developed as a powerful tool for solving classification problems. In many real world applications, each training point has a different effect on constructing classification rule. Lin and Wang (2002) proposed fuzzy support vector machines for this kind of classification problems, which assign fuzzy memberships to the input data and reformulate the support vector classification. In this paper another intuitive approach is proposed by using the fuzzy ${\alpha}-cut$ set. It will show us the trend of classification functions as ${\alpha}$ changes.

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

Selection Method of Fuzzy Partitions in Fuzzy Rule-Based Classification Systems (퍼지 규칙기반 분류시스템에서 퍼지 분할의 선택방법)

  • Son, Chang-S.;Chung, Hwan-M.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.360-366
    • /
    • 2008
  • The initial fuzzy partitions in fuzzy rule-based classification systems are determined by considering the domain region of each attribute with the given data, and the optimal classification boundaries within the fuzzy partitions can be discovered by tuning their parameters using various learning processes such as neural network, genetic algorithm, and so on. In this paper, we propose a selection method for fuzzy partition based on statistical information to maximize the performance of pattern classification without learning processes where statistical information is used to extract the uncertainty regions (i.e., the regions which the classification boundaries in pattern classification problems are determined) in each input attribute from the numerical data. Moreover the methods for extracting the candidate rules which are associated with the partition intervals generated by statistical information and for minimizing the coupling problem between the candidate rules are additionally discussed. In order to show the effectiveness of the proposed method, we compared the classification accuracy of the proposed with those of conventional methods on the IRIS and New Thyroid Cancer data. From experimental results, we can confirm the fact that the proposed method only considering statistical information of the numerical patterns provides equal to or better classification accuracy than that of the conventional methods.

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.1
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

Discriminant Analysis of Binary Data by Using the Maximum Entropy Distribution

  • Lee, Jung Jin;Hwang, Joon
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.909-917
    • /
    • 2003
  • Although many classification models have been used to classify binary data, none of the classification models dominates all varying circumstances depending on the number of variables and the size of data(Asparoukhov and Krzanowski (2001)). This paper proposes a classification model which uses information on marginal distributions of sub-variables and its maximum entropy distribution. Classification experiments by using simulation are discussed.

A Note on Linear SVM in Gaussian Classes

  • Jeon, Yongho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.3
    • /
    • pp.225-233
    • /
    • 2013
  • The linear support vector machine(SVM) is motivated by the maximal margin separating hyperplane and is a popular tool for binary classification tasks. Many studies exist on the consistency properties of SVM; however, it is unknown whether the linear SVM is consistent for estimating the optimal classification boundary even in the simple case of two Gaussian classes with a common covariance, where the optimal classification boundary is linear. In this paper we show that the linear SVM can be inconsistent in the univariate Gaussian classification problem with a common variance, even when the best tuning parameter is used.