• 제목/요약/키워드: Feature Distribution

검색결과 967건 처리시간 0.352초

다중 클래스 분포 문제에 대한 분류 정확도 분석 (Analysis of Classification Accuracy for Multiclass Problems)

  • 최의선;이철희
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 하계종합학술대회 논문집(4)
    • /
    • pp.190-193
    • /
    • 2000
  • In this paper, we investigate the distribution of classification accuracies of multiclass problems in the feature space and analyze performances of the conventional feature extraction algorithms. In order to find the distribution of classification accuracies, we sample the feature space and compute the classification accuracy corresponding to each sampling point. Experimental results showed that there exist much better feature sets that the conventional feature extraction algorithms fail to find. In addition, the distribution of classification accuracies is useful for developing and evaluating the feature extraction algorithm.

  • PDF

A Novel Statistical Feature Selection Approach for Text Categorization

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • 제13권5호
    • /
    • pp.1397-1409
    • /
    • 2017
  • For text categorization task, distinctive text features selection is important due to feature space high dimensionality. It is important to decrease the feature space dimension to decrease processing time and increase accuracy. In the current study, for text categorization task, we introduce a novel statistical feature selection approach. This approach measures the term distribution in all collection documents, the term distribution in a certain category and the term distribution in a certain class relative to other classes. The proposed method results show its superiority over the traditional feature selection methods.

Robust Histogram Equalization Using Compensated Probability Distribution

  • Kim, Sung-Tak;Kim, Hoi-Rin
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.131-142
    • /
    • 2005
  • A mismatch between the training and the test conditions often causes a drastic decrease in the performance of the speech recognition systems. In this paper, non-linear transformation techniques based on histogram equalization in the acoustic feature space are studied for reducing the mismatched condition. The purpose of histogram equalization(HEQ) is to convert the probability distribution of test speech into the probability distribution of training speech. While conventional histogram equalization methods consider only the probability distribution of a test speech, for noise-corrupted test speech, its probability distribution is also distorted. The transformation function obtained by this distorted probability distribution maybe bring about miss-transformation of feature vectors, and this causes the performance of histogram equalization to decrease. Therefore, this paper proposes a new method of calculating noise-removed probability distribution by using assumption that the CDF of noisy speech feature vectors consists of component of speech feature vectors and component of noise feature vectors, and this compensated probability distribution is used in HEQ process. In the AURORA-2 framework, the proposed method reduced the error rate by over $44\%$ in clean training condition compared to the baseline system. For multi training condition, the proposed methods are also better than the baseline system.

  • PDF

다중 분포 학습 모델을 위한 Haar-like Feature와 Decision Tree를 이용한 학습 알고리즘 (Learning Algorithm for Multiple Distribution Data using Haar-like Feature and Decision Tree)

  • 곽주현;원일용;이창훈
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제2권1호
    • /
    • pp.43-48
    • /
    • 2013
  • Adaboost 알고리즘은 얼굴인식을 위한 Haar-like feature들을 이용하기 위해 가장 널리 쓰이고 있는 알고리즘이다. 매우 빠르며 효율적인 성능을 보이고 있으며 하나의 모델이미지가 존재하는 단일분포 데이터에 대해 매우 효율적이다. 그러나 정면 얼굴과 측면 얼굴을 혼합한 인식 등 둘 이상의 모델이미지를 가진 다중 분포모델에 대해서는 그 성능이 저하된다. 이는 단일 학습 알고리즘의 선형결합에 의존하기 때문에 생기는 현상이며 그 응용범위의 한계를 지니게 된다. 본 연구에서는 이를 해결하기 위한 제안으로서 Decision Tree를 Harr-like Feature와 결합하는 기법을 제안한다. Decision Tree를 사용 함으로서 보다 넓은 분야의 문제를 해결하기 위해 기존의 Decision Tree를 Harr-like Feature에 적합하도록 개선한 HDCT라고 하는 Harr-like Feature를 활용한 Decision Tree를 제안하였으며 이것의 성능을 Adaboost와 비교 평가하였다.

Kernel PCA를 이용한 GMM 기반의 음성변환 (GMM Based Voice Conversion Using Kernel PCA)

  • 한준희;배재현;오영환
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.167-180
    • /
    • 2008
  • This paper describes a novel spectral envelope conversion method based on Gaussian mixture model (GMM). The core of this paper is rearranging source feature vectors in input space to the transformed feature vectors in feature space for the better modeling of GMM of source and target features. The quality of statistical modeling is dependent on the distribution and the dimension of data. The proposed method transforms both of the distribution and dimension of data and gives us the chance to model the same data with different configuration. Because the converted feature vectors should be on the input space, only source feature vectors are rearranged in the feature space and target feature vectors remain unchanged for the joint pdf of source and target features using KPCA. The experimental result shows that the proposed method outperforms the conventional GMM-based conversion method in various training environment.

  • PDF

문서의 불균등 분포를 고려한 단어 불순도 기반 특징 선택 방법 (An Enhanced Feature Selection Method Based on the Impurity of Words Considering Unbalanced Distribution of Documents)

  • 강진범;양재영;최중민
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제34권9호
    • /
    • pp.804-816
    • /
    • 2007
  • 기계 학습 과정에서 수집된 많은 정보들 중에는 학습하고자 하는 개념과 관련이 없거나 중복된 정보를 가진 경우가 많다. 또한 자료 자체에 오류가 있기도 하다. 이와 같이 학습 모델 생성을 위해 수집된 정보를 신뢰할 수 없다면, 학습 과정에서도 정확한 지식 습득이 어렵다. 그래서 기계 학습은 학습 과정에서 정확한 지식 습득을 위해 특징 선택 방법을 사용한다. 특징 선택은 학습할 클래스와 관련이 없거나 중복된 정보를 학습 모델 생성 이전에 제거함으로써 학습 알고리즘의 성능을 향상시킨다. 기존의 특징선택 방법들은 적절한 특징을 선택하기 위하여 문서가 균등하게 분포되어 있다고 가정한다. 하지만, 실제로는 그렇지 않으며, 문서의 수 또는 문서의 길이가 모두 동일한 학습 예제를 준비하는 것도 매우 어렵다. 본 논문에서는 보다 효율적으로 특징을 선택하기 위해 클래스 별 단어의 불순도와 문서의 불균등 분포를 고려한 특징 선택 방법을 제안한다. 클래스를 대표할 수 있는 특징 후보들을 단어의 불순도 측정을 통해 얻고, 문서의 불균등 분포를 고려하여 특징을 선택한다. 실험을 통해 보다 좋은 성능을 보임을 입증한다.

조립품을 위한 비선형 공차할당 (Nonlinear Tolerance Allocation for Assembly Components)

  • 김광수;최후곤
    • 산업공학
    • /
    • 제16권spc호
    • /
    • pp.39-44
    • /
    • 2003
  • As one of many design variables, the role of dimension tolerances is to restrict the amount of size variation in a manufactured feature while ensuring functionality. In this study, a nonlinear integer model has been modeled to allocate the optimal tolerance to each individual feature at a minimum manufacturing cost. While a normal distribution determines statistically worst tolerances with its symmetrical property in many previous tolerance allocation studies, a asymmetrical distribution is more realistic because its mean is not always coincident with a process center. A nonlinear integer model is modeled to allocate the optimal tolerance to a feature based on a beta distribution at a minimum total cost. The total cost as a function of tolerances is defined by machining cost and quality loss. After the convexity of manufacturing cost is checked by the Hessian matrix, the model is solved by the Complex Method. Finally, a numerical example is presented demonstrating successful model implementation for a nonlinear design case.

디지털콘텐츠 경제 상거래를 위한 유통 사이트 고객 수용도에 관한 연구 (A Study on Acceptance of Customer of Digital Contents Distribution Site's for Economy Commerce)

  • 이재광;권혁인
    • 통상정보연구
    • /
    • 제8권4호
    • /
    • pp.3-22
    • /
    • 2006
  • Recently, the use of digital contents and demand have been increased with expanding users of internet. Thus, the importance of digital contents distribution site's has been increased that deal in commercially. The model that measuring acceptance of web sites is studying lively, however, the web sites that dealing and distributing specific goods to be called digital contents have insufficient theoretical base and model about acceptance of customers. Also, the research that acceptance of existing commercial web sites have limitation to explain systematically which influence on acceptance of digital contents distribution sites. Because, those research connect directly the feature of web sites, the purchase of web sites or the feature of buyers and acceptance. For that reason, it's hard to reflect the feature of digital contents. In this research, to measure customers' acceptance of web sites that distribute digital image, it is based on Technology Acceptance Model by Davis. This research find out the significant cause from survey by users of digital image distribution site. and TAM which has been adapted the analyzation of new site's acceptance can explain the state of digital image distribution site use. This research let us know the evaluation of digital image distribution site and operating strategy as a new business model.

  • PDF

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

  • Pouramini, Jafar;Minaei-Bidgoli, Behrouze;Esmaeili, Mahdi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권8호
    • /
    • pp.3725-3748
    • /
    • 2018
  • Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.

HMM 인식기에서 상태별 다중 특징 파라미터 가중 (State-Dependent Weighting of Multiple Feature Parameters in HMM Recognizer)

  • 손종목;배건성
    • 한국음향학회지
    • /
    • 제18권4호
    • /
    • pp.47-52
    • /
    • 1999
  • 본 논문에서는 특징 파라미터의 분산과 인식성능에 대한 기여도를 고려하여 각 특징 파라미터를 가중시키는 방법을 제안하였다. 각 특징 파라미터의 인식률에 비례하게 전체 기여도를 설정하고, 각 특징 파라미터의 분산에 따라 가중요인을 설정하였다. 전체 기여도와 분산에 따른 가중요인을 사용하여 각 특징 파라미터의 상태별 가중치를 설정하였다. 제안한 방법의 유효성을 살펴보기 위해 유사음소 단위의 HMM 음성인식시스템을 사용하여 인식실험을 하였다. 인식실험에서 제안한 방법으로 가중치를 설정하였을 경우에 인식률이 7.7% 향상됨을 볼 수 있었다.

  • PDF