• Title/Summary/Keyword: misclassification

Search Result 231, Processing Time 0.027 seconds

A new classification method using penalized partial least squares (벌점 부분최소자승법을 이용한 분류방법)

  • Kim, Yun-Dae;Jun, Chi-Hyuck;Lee, Hye-Seon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.931-940
    • /
    • 2011
  • Classification is to generate a rule of classifying objects into several categories based on the learning sample. Good classification model should classify new objects with low misclassification error. Many types of classification methods have been developed including logistic regression, discriminant analysis and tree. This paper presents a new classification method using penalized partial least squares. Penalized partial least squares can make the model more robust and remedy multicollinearity problem. This paper compares the proposed method with logistic regression and PCA based discriminant analysis by some real and artificial data. It is concluded that the new method has better power as compared with other methods.

Investigations on the Optimal Support Vector Machine Classifiers for Predicting Design Feasibility in Analog Circuit Optimization

  • Lee, Jiho;Kim, Jaeha
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.5
    • /
    • pp.437-444
    • /
    • 2015
  • In simulation-based circuit optimization, many simulation runs may be wasted while evaluating infeasible designs, i.e. the designs that do not meet the constraints. To avoid such a waste, this paper investigates the use of support vector machine (SVM) classifiers in predicting the design's feasibility prior to simulation and the optimal selection of the SVM parameters, namely, the Gaussian kernel shape parameter ${\gamma}$ and the misclassification penalty parameter C. These parameters affect the complexity as well as the accuracy of the model that SVM represents. For instance, the higher ${\gamma}$ is good for detailed modeling and the higher C is good for rejecting noise in the training set. However, our empirical study shows that a low ${\gamma}$ value is preferable due to the high spatial correlation among the circuit design candidates while C has negligible impacts due to the smooth and clean constraint boundaries of most circuit designs. The experimental results with an LC-tank oscillator example show that an optimal selection of these parameters can improve the prediction accuracy from 80 to 98% and model complexity by $10{\times}$.

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.117-124
    • /
    • 2015
  • This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.

Epidemiologic Methods and Study Designs for Investigating Adverse Health Effects of Ambient Air Pollution (대기오염의 건강 영향 평가를 위한 역학연구 설계 및 방법론)

  • Kim, Ho;Lee, Jong-Tae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.34 no.2
    • /
    • pp.119-126
    • /
    • 2001
  • Air pollution epidemiologic studies are intrinsically difficult because the expected effect size at general environmental levels is small, exposure and misclassification of exposure are common, and exposure is not selective to a specific pollutant. In this review paper, epidemiologic study designs and analytic methods are described, and two nationwide projects on air pollution epidemiology are introduced. This paper also demonstrates that possible confounding issues in time-series analysis can be resolved and the impact on the use of data from ambient monitoring stations may not be critical. In this paper we provide a basic understanding of the types of air pollution epidemiologic study designs that be subdivided by the mode of air pollution effects on human health (acute or chronic). With the improvements in the area of air pollution epidemiologic studies, we should emphasize that elaborate models and statistical techniques cannot compensate for inadequate study design or poor data collection.

  • PDF

The Unified Framework for AUC Maximizer

  • Jun, Jong-Jun;Kim, Yong-Dai;Han, Sang-Tae;Kang, Hyun-Cheol;Choi, Ho-Sik
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.6
    • /
    • pp.1005-1012
    • /
    • 2009
  • The area under the curve(AUC) is commonly used as a measure of the receiver operating characteristic(ROC) curve which displays the performance of a set of binary classifiers for all feasible ratios of the costs associated with true positive rate(TPR) and false positive rate(FPR). In the bipartite ranking problem where one has to compare two different observations and decide which one is "better", the AUC measures the quantity that ranking score of a randomly chosen sample in one class is larger than that of a randomly chosen sample in the other class and hence, the function which maximizes an AUC of bipartite ranking problem is different to the function which maximizes (minimizes) accuracy (misclassification error rate) of binary classification problem. In this paper, we develop a way to construct the unified framework for AUC maximizer including support vector machines based on maximizing large margin and logistic regression based on estimating posterior probability. Moreover, we develop an efficient algorithm for the proposed unified framework. Numerical results show that the propose unified framework can treat various methodologies successfully.

A Classification Analysis using Bayesian Neural Network (베이지안 신경망을 이용한 분류분석)

  • Hwang, Jin-Soo;Choi, Seong-Yong;Jun, Hong-Suk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.2
    • /
    • pp.11-25
    • /
    • 2001
  • There are several algorithms for classification in modeling relations, patterns, and rules which exist in data. We learn to classify objects on the basis of instances presented to us, not by being given a set of classification rules. The Bayesian learning uses the probability distribution to express our knowledge about unknown parameters and update our knowledge by the law of probability as the evidence gathered from data. Also, the neural network models are designed for predicting an unknown category or quantity on the basis of known attributes by training. In this paper, we compare the misclassification error rates of Bayesian Neural Network method with those of other classification algorithms, CHAID, CART, and QUBST using several data sets.

  • PDF

Dynamic Text Categorizing Method using Text Mining and Association Rule

  • Kim, Young-Wook;Kim, Ki-Hyun;Lee, Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.103-109
    • /
    • 2018
  • In this paper, we propose a dynamic document classification method which breaks away from existing document classification method with artificial categorization rules focusing on suppliers and has changing categorization rules according to users' needs or social trends. The core of this dynamic document classification method lies in the fact that it creates classification criteria real-time by using topic modeling techniques without standardized category rules, which does not force users to use unnecessary frames. In addition, it can also search the details through the relevance analysis by calculating the relationship between the words that is difficult to grasp by word frequency alone. Rather than for logical and systematic documents, this method proposed can be used more effectively for situation analysis and retrieving information of unstructured data which do not fit the category of existing classification such as VOC (Voice Of Customer), SNS and customer reviews of Internet shopping malls and it can react to users' needs flexibly. In addition, it has no process of selecting the classification rules by the suppliers and in case there is a misclassification, it requires no manual work, which reduces unnecessary workload.

Performance Improvement of WTCP by Differentiated Handling of Congestion and Random Loss (혼잡 및 무선 구간 손실의 차별적 처리를 통한 WTCP 성능 개선)

  • Cho, Nam-Jin;Lee, Sung-Chang
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.45 no.9
    • /
    • pp.30-38
    • /
    • 2008
  • The traditional TCP was designed assuming wired networks. Thus, if it is used networks consisting of both wired and wireless networks, all packet losses including random losses in wireless links are regarded as network congestion losses. Misclassification of packet losses causes unnecessary reduction of transmission rate, and results in waste of bandwidth. In this paper, we present WTCP(wireless TCP) congestion control algorithm that differentiates the random losses more accurately, and adopts improved congestion control which results in better network throughput. To evaluate the performance of proposed scheme, we compared the proposed algorithm with TCP Westwood and TCP Veno via simulations.

Vehicle Shadow Removal For Intelligent Traffic System

  • Jang, Dae-Geun;Kim, Eui-Jeong
    • Journal of information and communication convergence engineering
    • /
    • v.4 no.3
    • /
    • pp.123-129
    • /
    • 2006
  • The limited number of roads and the increasing number of vehicles demand the automatic regulation of overspeed vehicles, illegal vehicles, and overloaded vehicles and the automatic charge calculation depending on the type of the vehicle. To meet such requirements, it is important to remove the shadow of the vehicle as processing and recognizing an image captured by a camera. The shadow of the vehicle is likely to cause misclassification of the vehicle type due to diverse errors and mistakes occurring when detecting geometrical properties of the vehicle. In case that shadows of two different vehicles are overlapped, not only the type of the vehicles may be misclassified but also it is difficult to accurately identify the type of the vehicles. In this paper, we propose a robust algorithm to remove the shadow of a vehicle by calculating the luminance, the chrominance, the gradient density of the cast shadow from information acquired using the image subtraction of the background, and to recognize the substantial vehicle figure. Even when it is hard to detect and split a target vehicle from its shadow as shadows of vehicles are attached to each other, our robust algorithm can detect the vehicle figure only. We implemented our system with a general camera and conducted experiments on various vehicles on general roads to find out our vehicle shade removal algorithm is efficient when detecting and recognizing vehicles.

Wireless Internet Service Classification using Data Mining (데이터 마이닝을 이용한 무선 인터넷 서비스 분류기법)

  • Lee, Seong-Jin;Song, Jong-Woo;Ahn, Soo-Han;Won, You-Jip;Chang, Jae-Sung
    • Journal of KIISE:Information Networking
    • /
    • v.36 no.3
    • /
    • pp.153-162
    • /
    • 2009
  • It is a challenging work for service operators to accurately classify different services, which runs on various wireless networks based upon numerous platforms. This works focuses on design and implementation of a classifier, which accurately classifies applications, which are captured horn WiBro Network. Notion of session is introduced for the classifier, instead of commonly used Flow to develop a classifier. Based on session information of given traffic, two classification algorithms are presented, Classification and Regression Tree and Support Vector Machine. Both algorithms are capable of classifying accurately and effectively with misclassification rate of 0.85%, and 0.94%, respectively. This work shows that classifier using CART provides ease of interpreting the result and implementation.