• Title/Summary/Keyword: Feature Distribution

Search Result 967, Processing Time 0.025 seconds

Analysis of Classification Accuracy for Multiclass Problems (다중 클래스 분포 문제에 대한 분류 정확도 분석)

  • 최의선;이철희
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.190-193
    • /
    • 2000
  • In this paper, we investigate the distribution of classification accuracies of multiclass problems in the feature space and analyze performances of the conventional feature extraction algorithms. In order to find the distribution of classification accuracies, we sample the feature space and compute the classification accuracy corresponding to each sampling point. Experimental results showed that there exist much better feature sets that the conventional feature extraction algorithms fail to find. In addition, the distribution of classification accuracies is useful for developing and evaluating the feature extraction algorithm.

  • PDF

A Novel Statistical Feature Selection Approach for Text Categorization

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1397-1409
    • /
    • 2017
  • For text categorization task, distinctive text features selection is important due to feature space high dimensionality. It is important to decrease the feature space dimension to decrease processing time and increase accuracy. In the current study, for text categorization task, we introduce a novel statistical feature selection approach. This approach measures the term distribution in all collection documents, the term distribution in a certain category and the term distribution in a certain class relative to other classes. The proposed method results show its superiority over the traditional feature selection methods.

Robust Histogram Equalization Using Compensated Probability Distribution

  • Kim, Sung-Tak;Kim, Hoi-Rin
    • MALSORI
    • /
    • v.55
    • /
    • pp.131-142
    • /
    • 2005
  • A mismatch between the training and the test conditions often causes a drastic decrease in the performance of the speech recognition systems. In this paper, non-linear transformation techniques based on histogram equalization in the acoustic feature space are studied for reducing the mismatched condition. The purpose of histogram equalization(HEQ) is to convert the probability distribution of test speech into the probability distribution of training speech. While conventional histogram equalization methods consider only the probability distribution of a test speech, for noise-corrupted test speech, its probability distribution is also distorted. The transformation function obtained by this distorted probability distribution maybe bring about miss-transformation of feature vectors, and this causes the performance of histogram equalization to decrease. Therefore, this paper proposes a new method of calculating noise-removed probability distribution by using assumption that the CDF of noisy speech feature vectors consists of component of speech feature vectors and component of noise feature vectors, and this compensated probability distribution is used in HEQ process. In the AURORA-2 framework, the proposed method reduced the error rate by over $44\%$ in clean training condition compared to the baseline system. For multi training condition, the proposed methods are also better than the baseline system.

  • PDF

Learning Algorithm for Multiple Distribution Data using Haar-like Feature and Decision Tree (다중 분포 학습 모델을 위한 Haar-like Feature와 Decision Tree를 이용한 학습 알고리즘)

  • Kwak, Ju-Hyun;Woen, Il-Young;Lee, Chang-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.43-48
    • /
    • 2013
  • Adaboost is widely used for Haar-like feature boosting algorithm in Face Detection. It shows very effective performance on single distribution model. But when detecting front and side face images at same time, Adaboost shows it's limitation on multiple distribution data because it uses linear combination of basic classifier. This paper suggest the HDCT, modified decision tree algorithm for Haar-like features. We still tested the performance of HDCT compared with Adaboost on multiple distributed image recognition.

GMM Based Voice Conversion Using Kernel PCA (Kernel PCA를 이용한 GMM 기반의 음성변환)

  • Han, Joon-Hee;Bae, Jae-Hyun;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.167-180
    • /
    • 2008
  • This paper describes a novel spectral envelope conversion method based on Gaussian mixture model (GMM). The core of this paper is rearranging source feature vectors in input space to the transformed feature vectors in feature space for the better modeling of GMM of source and target features. The quality of statistical modeling is dependent on the distribution and the dimension of data. The proposed method transforms both of the distribution and dimension of data and gives us the chance to model the same data with different configuration. Because the converted feature vectors should be on the input space, only source feature vectors are rearranged in the feature space and target feature vectors remain unchanged for the joint pdf of source and target features using KPCA. The experimental result shows that the proposed method outperforms the conventional GMM-based conversion method in various training environment.

  • PDF

An Enhanced Feature Selection Method Based on the Impurity of Words Considering Unbalanced Distribution of Documents (문서의 불균등 분포를 고려한 단어 불순도 기반 특징 선택 방법)

  • Kang, Jin-Beom;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.804-816
    • /
    • 2007
  • Sample training data for machine learning often contain irrelevant information or redundant concept. It is also the case that the original data may include noise. If the information collected for constructing learning model is not reliable, it is difficult to obtain accurate information. So the system attempts to find relations or regulations between features and categories in the teaming phase. The feature selection is to remove irrelevant or redundant information before constructing teaming model. for improving its performance. Existing feature selection methods assume that the distribution of documents is balanced in terms of the number of documents for each class and the length of each document. In practice, however, it is difficult not only to prepare a set of documents with almost equal length, but also to define a number of classes with fixed number of document elements. In this paper, we propose a new feature selection method that considers the impurities among the words and unbalanced distribution of documents in categories. We could obtain feature candidates using the word impurity and eventually select the features through unbalanced distribution of documents. We demonstrate that our method performs better than other existing methods via some experiments.

Nonlinear Tolerance Allocation for Assembly Components (조립품을 위한 비선형 공차할당)

  • Kim, Kwang-Soo;Choi, Hoo-Gon
    • IE interfaces
    • /
    • v.16 no.spc
    • /
    • pp.39-44
    • /
    • 2003
  • As one of many design variables, the role of dimension tolerances is to restrict the amount of size variation in a manufactured feature while ensuring functionality. In this study, a nonlinear integer model has been modeled to allocate the optimal tolerance to each individual feature at a minimum manufacturing cost. While a normal distribution determines statistically worst tolerances with its symmetrical property in many previous tolerance allocation studies, a asymmetrical distribution is more realistic because its mean is not always coincident with a process center. A nonlinear integer model is modeled to allocate the optimal tolerance to a feature based on a beta distribution at a minimum total cost. The total cost as a function of tolerances is defined by machining cost and quality loss. After the convexity of manufacturing cost is checked by the Hessian matrix, the model is solved by the Complex Method. Finally, a numerical example is presented demonstrating successful model implementation for a nonlinear design case.

A Study on Acceptance of Customer of Digital Contents Distribution Site's for Economy Commerce (디지털콘텐츠 경제 상거래를 위한 유통 사이트 고객 수용도에 관한 연구)

  • Lee, Jae-Kwang;Kwon, Hyeog-In
    • International Commerce and Information Review
    • /
    • v.8 no.4
    • /
    • pp.3-22
    • /
    • 2006
  • Recently, the use of digital contents and demand have been increased with expanding users of internet. Thus, the importance of digital contents distribution site's has been increased that deal in commercially. The model that measuring acceptance of web sites is studying lively, however, the web sites that dealing and distributing specific goods to be called digital contents have insufficient theoretical base and model about acceptance of customers. Also, the research that acceptance of existing commercial web sites have limitation to explain systematically which influence on acceptance of digital contents distribution sites. Because, those research connect directly the feature of web sites, the purchase of web sites or the feature of buyers and acceptance. For that reason, it's hard to reflect the feature of digital contents. In this research, to measure customers' acceptance of web sites that distribute digital image, it is based on Technology Acceptance Model by Davis. This research find out the significant cause from survey by users of digital image distribution site. and TAM which has been adapted the analyzation of new site's acceptance can explain the state of digital image distribution site use. This research let us know the evaluation of digital image distribution site and operating strategy as a new business model.

  • PDF

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

  • Pouramini, Jafar;Minaei-Bidgoli, Behrouze;Esmaeili, Mahdi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.3725-3748
    • /
    • 2018
  • Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.

State-Dependent Weighting of Multiple Feature Parameters in HMM Recognizer (HMM 인식기에서 상태별 다중 특징 파라미터 가중)

  • 손종목;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.47-52
    • /
    • 1999
  • In this paper, we proposed a new approach to weight each feature parameter by considering the dispersion of feature parameters and its degree of contribution to recognition rate. We determined the total distribution factor that is proportional to recognition rate of each feature parameter and the dispersion factor according to the dispersion of each feature parameter. Then. we determined state-dependent weighting using the total distribution factor and dispersion factor. To verify the validity of the proposed approach, recognition experiments were performed using the PLU(Phoneme-Like Unit)-based HMM. Experimental results showed the improvement of 7.7% at the recognition rate using the proposed method.

  • PDF