• Title/Summary/Keyword: Statistical classification

Search Result 1,419, Processing Time 0.037 seconds

The use of support vector machines in semi-supervised classification

  • Bae, Hyunjoo;Kim, Hyungwoo;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.193-202
    • /
    • 2022
  • Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but effective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.

A Study on Statistical Thinking and developing Statistical thoughts (통계적 사고와 그 함양에 관한 연구)

  • Kim, Sang-Lyong
    • Education of Primary School Mathematics
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2009
  • This paper aims to develop a program which cultivates statistical ability for elementary students. For this purpose, I examined the relationship between mathematical thinking and statistical thinking. I developed statistical programs including classification, discussion of data, generating statistical problem and project program. As result, this study suggests implications for further elementary statistical education.

  • PDF

Incremental Multi-classification by Least Squares Support Vector Machine

  • Oh, Kwang-Sik;Shim, Joo-Yong;Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.965-974
    • /
    • 2003
  • In this paper we propose an incremental classification of multi-class data set by LS-SVM. By encoding the output variable in the training data set appropriately, we obtain a new specific output vectors for the training data sets. Then, online LS-SVM is applied on each newly encoded output vectors. Proposed method will enable the computation cost to be reduced and the training to be performed incrementally. With the incremental formulation of an inverse matrix, the current information and new input data are used for building another new inverse matrix for the estimation of the optimal bias and lagrange multipliers. Computational difficulties of large scale matrix inversion can be avoided. Performance of proposed method are shown via numerical studies and compared with artificial neural network.

  • PDF

A review of tree-based Bayesian methods

  • Linero, Antonio R.
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.543-559
    • /
    • 2017
  • Tree-based regression and classification ensembles form a standard part of the data-science toolkit. Many commonly used methods take an algorithmic view, proposing greedy methods for constructing decision trees; examples include the classification and regression trees algorithm, boosted decision trees, and random forests. Recent history has seen a surge of interest in Bayesian techniques for constructing decision tree ensembles, with these methods frequently outperforming their algorithmic counterparts. The goal of this article is to survey the landscape surrounding Bayesian decision tree methods, and to discuss recent modeling and computational developments. We provide connections between Bayesian tree-based methods and existing machine learning techniques, and outline several recent theoretical developments establishing frequentist consistency and rates of convergence for the posterior distribution. The methodology we present is applicable for a wide variety of statistical tasks including regression, classification, modeling of count data, and many others. We illustrate the methodology on both simulated and real datasets.

A Co-Evolutionary Computing for Statistical Learning Theory

  • Jun Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.4
    • /
    • pp.281-285
    • /
    • 2005
  • Learning and evolving are two basics for data mining. As compared with classical learning theory based on objective function with minimizing training errors, the recently evolutionary computing has had an efficient approach for constructing optimal model without the minimizing training errors. The global search of evolutionary computing in solution space can settle the local optima problems of learning models. In this research, combining co-evolving algorithm into statistical learning theory, we propose an co-evolutionary computing for statistical learning theory for overcoming local optima problems of statistical learning theory. We apply proposed model to classification and prediction problems of the learning. In the experimental results, we verify the improved performance of our model using the data sets from UCI machine learning repository and KDD Cup 2000.

COMPARISON OF SPECKLE REDUCTION METHODS FOR MULTISOURCE LAND-COVER CLASSIFICATION BY NEURAL NETWORK : A CASE STUDY IN THE SOUTH COAST OF KOREA

  • Ryu, Joo-Hyung;Won, Joong-Sun;Kim, Sang-Wan
    • Proceedings of the KSRS Conference
    • /
    • 1999.11a
    • /
    • pp.144-147
    • /
    • 1999
  • The objective of this study is to quantitatively evaluate the effects of various SAR speckle reduction methods for multisource land-cover classification by backpropagation neural network, especially over the coastal region. The land-cover classification using neural network has an advantage over conventional statistical approaches in that it is distribution-free and no prior knowledge of the statistical distributions of the classes is needed. The goal of multisource land-cover classification acquired by different sensors is to reduce the classification error, and consequently SAR can be utilized an complementary tool to optical sensors. SAR speckle is, however, an serious limiting factor when it is exploited for land-cover classification. In order to reduce this problem. we test various speckle methods including Frost, Median, Kuan and EPOS. Interpreting the weights about training pixel samples, the “Importance Value” of each SAR images that reduced speckle can be estimated based on its contribution to the classification. In this study, the “Importance Value” is used as a criterion of the effectiveness.

  • PDF

Shape Property Study of Hangul Font for Font Classification (글꼴 분류를 위한 한글 글꼴의 모양 특성 연구)

  • Kim, Hyun-Young;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.9
    • /
    • pp.1584-1595
    • /
    • 2017
  • Each cultural community has developed a variety of fonts to express their own language and characters. Hangul has also diversified its font shapes through changing the composition ratio and look of the consonants and vowels. Rather, thanks to the variety of these fonts, a considerable amount of time and effort must be devoted to the selection of a specific font shape. This is related to the fact that the current Hangul service and classification system process the font only with its name or the name of the manufacturer. It means that there is no consensus about the font shape classification system for Hangul. In this study, we propose a shape property set that can be a basis for classifying Hangul fonts. The font shape property set was generated by performing statistical analysis with features which have been studied by the font design experts and was verified through questionnaire using representative fonts based on the classification scheme defined by the Hangul font design classification system standard. This study is meaningful in that it is a study on shape classification properties of K-means and PCA statistical techniques based on font data rather than design field study.

A Review of Artificial Intelligence Models in Business Classification

  • Han, In-goo;Kwon, Young-sig;Jo, Hong-kyu
    • Journal of Intelligence and Information Systems
    • /
    • v.1 no.1
    • /
    • pp.23-41
    • /
    • 1995
  • Business researchers have traditionally used statistical techniques for classification. In late 1980's, inductive learning started to be used for business classification. Recently, neural network began to be a, pp.ied for business classification. This study reviews the business classification studies, identifies a neural network a, pp.oach as the most powerful classification tool, and discusses the problems and issues in neural network a, pp.ications.

  • PDF

Statistical Approach to Sentiment Classification using MapReduce (맵리듀스를 이용한 통계적 접근의 감성 분류)

  • Kang, Mun-Su;Baek, Seung-Hee;Choi, Young-Sik
    • Science of Emotion and Sensibility
    • /
    • v.15 no.4
    • /
    • pp.425-440
    • /
    • 2012
  • As the scale of the internet grows, the amount of subjective data increases. Thus, A need to classify automatically subjective data arises. Sentiment classification is a classification of subjective data by various types of sentiments. The sentiment classification researches have been studied focused on NLP(Natural Language Processing) and sentiment word dictionary. The former sentiment classification researches have two critical problems. First, the performance of morpheme analysis in NLP have fallen short of expectations. Second, it is not easy to choose sentiment words and determine how much a word has a sentiment. To solve these problems, this paper suggests a combination of using web-scale data and a statistical approach to sentiment classification. The proposed method of this paper is using statistics of words from web-scale data, rather than finding a meaning of a word. This approach differs from the former researches depended on NLP algorithms, it focuses on data. Hadoop and MapReduce will be used to handle web-scale data.

  • PDF

The Effective Training Method for the Statistical Classification of Remotely Sensed Imagery (위성영상의 통계적 분류를 위한 유효 트레이닝 기법에 관한 연구)

  • 이병길;김용일;어양담
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.17 no.3
    • /
    • pp.225-231
    • /
    • 1999
  • In statistical analysis of remotely sensed data, means and variances of each classes are used as the basis of statistical similarity determination. Therefore, the overall accuracy of classification is affected by the training results. It is assumed that the ideal distributions of pixel values follow normal distributions, but practically they have some aggregations and biases. non anomalies of distribution can affect the classification results greatly as well as the variances of training results. In this study, relationships between the inferential variances of the training sets and the distributions of pixel values are examined. and the resulting changes of classification results are studied. Furthermore, the training method which minimizes the effect of underestimation of variances is proposed.

  • PDF