• Title/Summary/Keyword: statistical classifier

Search Result 157, Processing Time 0.028 seconds

A Hybrid Method for classifying User's Asking Points (하이브리드 방법의 사용자 질의 의도 분류)

  • Harksoo Kim;An, Young Hun;Jungyun Seo
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.51-57
    • /
    • 2003
  • For QA systems to return correct answer phrases, it is very important that they correctly and stably analyze users' intention. To satisfy this need, we propose a question type classifier (i.e. asking point identifier) for practical QA systems. The classifier uses a hybrid method that combines a statistical method with a rule-based method according to some heuristic rules. Owing to the hybrid method, the classifier can reduce the time to manually construct rules, yield high precision rate and guarantee robustness. In the experiment, we accomplished 80% accuracy of the question type classification.

Fault Location and Classification of Combined Transmission System: Economical and Accurate Statistic Programming Framework

  • Tavalaei, Jalal;Habibuddin, Mohd Hafiz;Khairuddin, Azhar;Mohd Zin, Abdullah Asuhaimi
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.6
    • /
    • pp.2106-2117
    • /
    • 2017
  • An effective statistical feature extraction approach of data sampling of fault in the combined transmission system is presented in this paper. The proposed algorithm leads to high accuracy at minimum cost to predict fault location and fault type classification. This algorithm requires impedance measurement data from one end of the transmission line. Modal decomposition is used to extract positive sequence impedance. Then, the fault signal is decomposed by using discrete wavelet transform. Statistical sampling is used to extract appropriate fault features as benchmark of decomposed signal to train classifier. Support Vector Machine (SVM) is used to illustrate the performance of statistical sampling performance. The overall time of sampling is not exceeding 1 1/4 cycles, taking into account the interval time. The proposed method takes two steps of sampling. The first step takes 3/4 cycle of during-fault and the second step takes 1/4 cycle of post fault impedance. The interval time between the two steps is assumed to be 1/4 cycle. Extensive studies using MATLAB software show accurate fault location estimation and fault type classification of the proposed method. The classifier result is presented and compared with well-established travelling wave methods and the performance of the algorithms are analyzed and discussed.

Classification-Based Approach for Hybridizing Statistical and Rule-Based Machine Translation

  • Park, Eun-Jin;Kwon, Oh-Woog;Kim, Kangil;Kim, Young-Kil
    • ETRI Journal
    • /
    • v.37 no.3
    • /
    • pp.541-550
    • /
    • 2015
  • In this paper, we propose a classification-based approach for hybridizing statistical machine translation and rulebased machine translation. Both the training dataset used in the learning of our proposed classifier and our feature extraction method affect the hybridization quality. To create one such training dataset, a previous approach used auto-evaluation metrics to determine from a set of component machine translation (MT) systems which gave the more accurate translation (by a comparative method). Once this had been determined, the most accurate translation was then labelled in such a way so as to indicate the MT system from which it came. In this previous approach, when the metric evaluation scores were low, there existed a high level of uncertainty as to which of the component MT systems was actually producing the better translation. To relax such uncertainty or error in classification, we propose an alternative approach to such labeling; that is, a cut-off method. In our experiments, using the aforementioned cut-off method in our proposed classifier, we managed to achieve a translation accuracy of 81.5% - a 5.0% improvement over existing methods.

Study on the Effect of Discrepancy of Training Sample Population in Neural Network Classification

  • Lee, Sang-Hoon;Kim, Kwang-Eun
    • Korean Journal of Remote Sensing
    • /
    • v.18 no.3
    • /
    • pp.155-162
    • /
    • 2002
  • Neural networks have been focused on as a robust classifier for the remotely sensed imagery due to its statistical independency and teaming ability. Also the artificial neural networks have been reported to be more tolerant to noise and missing data. However, unlike the conventional statistical classifiers which use the statistical parameters for the classification, a neural network classifier uses individual training sample in teaming stage. The training performance of a neural network is know to be very sensitive to the discrepancy of the number of the training samples of each class. In this paper, the effect of the population discrepancy of training samples of each class was analyzed with three layered feed forward network. And a method for reducing the effect was proposed and experimented with Landsat TM image. The results showed that the effect of the training sample size discrepancy should be carefully considered for faster and more accurate training of the network. Also, it was found that the proposed method which makes teaming rate as a function of the number of training samples in each class resulted in faster and more accurate training of the network.

Kernel Pattern Recognition using K-means Clustering Method (K-평균 군집방법을 이요한 가중커널분류기)

  • 백장선;심정욱
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.447-455
    • /
    • 2000
  • We propose a weighted kernel pattern recognition method using the K -means clustering algorithm to reduce computation and storage required for the full kernel classifier. This technique finds a set of reference vectors and weights which are used to approximate the kernel classifier. Since the hierarchical clustering method implemented in the 'Weighted Parzen Window (WP\V) classifier is not able to rearrange the proper clusters, we adopt the K -means algorithm to find reference vectors and weights from the more properly rearranged clusters \Ve find that the proposed method outperforms the \VP\V method for the repre~entativeness of the reference vectors and the data reduction.

  • PDF

Multiclass Support Vector Machines with SCAD

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.5
    • /
    • pp.655-662
    • /
    • 2012
  • Classification is an important research field in pattern recognition with high-dimensional predictors. The support vector machine(SVM) is a penalized feature selector and classifier. It is based on the hinge loss function, the non-convex penalty function, and the smoothly clipped absolute deviation(SCAD) suggested by Fan and Li (2001). We developed the algorithm for the multiclass SVM with the SCAD penalty function using the local quadratic approximation. For multiclass problems we compared the performance of the SVM with the $L_1$, $L_2$ penalty functions and the developed method.

Visualizing SVM Classification in Reduced Dimensions

  • Huh, Myung-Hoe;Park, Hee-Man
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.5
    • /
    • pp.881-889
    • /
    • 2009
  • Support vector machines(SVMs) are known as flexible and efficient classifier of multivariate observations, producing a hyperplane or hyperdimensional curved surface in multidimensional feature space that best separates training samples by known groups. As various methodological extensions are made for SVM classifiers in recent years, it becomes more difficult to understand the constructed model intuitively. The aim of this paper is to visualize various SVM classifications tuned by several parameters in reduced dimensions, so that data analysts secure the tangible image of the products that the machine made.

Nomogram building to predict dyslipidemia using a naïve Bayesian classifier model (순수 베이지안 분류기 모델을 사용하여 이상지질혈증을 예측하는 노모 그램 구축)

  • Kim, Min-Ho;Seo, Ju-Hyun;Lee, Jea-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.619-630
    • /
    • 2019
  • Dyslipidemia is a representative chronic disease affecting Koreans that requires continuous management. It is also a known risk factor for cardiovascular disease such as hypertension and diabetes. However, it is difficult to diagnose vascular disease without a medical examination. This study identifies risk factors for the recognition and prevention of dyslipidemia. By integrating them, we construct a statistical instrumental nomogram that can predict the incidence rate while visualizing. Data were from the Korean National Health and Nutrition Examination Survey (KNHANES) for 2013-2016. First, a chi-squared test identified twelve risk factors of dyslipidemia. We used a naïve Bayesian classifier model to construct a nomogram for the dyslipidemia. The constructed nomogram was verified using a receiver operating characteristics curve and calibration plot. Finally, we compared the logistic nomogram previously presented with the Bayesian nomogram proposed in this study.

Proposal of Weight Adjustment Methods Using Statistical Information in Fuzzy Weighted Mean Classifiers (퍼지 가중치 평균 분류기에서 통계 정보를 활용한 가중치 설정 기법의 제안)

  • Woo, Young-Woon;Heo, Gyeong-Yong;Kim, Kwang-Baek
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.7
    • /
    • pp.9-15
    • /
    • 2009
  • The fuzzy weighted mean classifier is one of the most common classification models and could achieve high performance by adjusting the weights. However, the weights were generally decided based on the experience of experts, which made the resulting classifiers to suffer the lack of consistency and objectivity. To resolve this problem, in this paper, a weight deciding method based on the statistics of the data is introduced, which ensures the learned classifiers to be consistent and objective. To investigate the effectiveness of the proposed methods, Iris data set available from UCI machine learning repository is used and promising results are obtained.

Fast Automatic Modulation Classification by MDC and kNNC (MDC와 kNNC를 이용한 고속 자동변조인식)

  • Park, Cheol-Sun;Yang, Jong-Won;Nah, Sun-Phil;Jang, Won
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.10 no.4
    • /
    • pp.88-96
    • /
    • 2007
  • This paper discusses the fast modulation classifiers capable of classifying both analog and digital modulation signals in wireless communications applications. A total of 7 statistical signal features are extracted and used to classify 9 modulated signals. In this paper, we investigate the performance of the two types of fast modulation classifiers (i.e. 2 nearest neighbor classifiers and 2 minimum distance classifiers) and compare the performance of these classifiers with that of the state of the art for the existing classification methods such as SVM Classifier. Computer simulations indicate good performance on an AWGN channel, even at low signal-to-noise ratios, in case of minimum distance classifiers (MDC for short) and k nearest neighbor classifiers (kNNC for short). Besides a good performance, these type classifiers are considered as ideal candidate to adapt real-time software radio because of their fast modulation classification capability.