• Title/Summary/Keyword: gaussian mixture models

Search Result 99, Processing Time 0.021 seconds

Minimum Classification Error Training to Improve Discriminability of PCMM-Based Feature Compensation (PCMM 기반 특징 보상 기법에서 변별력 향상을 위한 Minimum Classification Error 훈련의 적용)

  • Kim Wooil;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.58-68
    • /
    • 2005
  • In this paper, we propose a scheme to improve discriminative property in the feature compensation method for robust speech recognition under noisy environments. The estimation of noisy speech model used in existing feature compensation methods do not guarantee the computation of posterior probabilities which discriminate reliably among the Gaussian components. Estimation of Posterior probabilities is a crucial step in determining the discriminative factor of the Gaussian models, which in turn determines the intelligibility of the restored speech signals. The proposed scheme employs minimum classification error (MCE) training for estimating the parameters of the noisy speech model. For applying the MCE training, we propose to identify and determine the 'competing components' that are expected to affect the discriminative ability. The proposed method is applied to feature compensation based on parallel combined mixture model (PCMM). The performance is examined over Aurora 2.0 database and over the speech recorded inside a car during real driving conditions. The experimental results show improved recognition performance in both simulated environments and real-life conditions. The result verifies the effectiveness of the proposed scheme for increasing the performance of robust speech recognition systems.

IR Image Segmentation using GrabCut (GrabCut을 이용한 IR 영상 분할)

  • Lee, Hee-Yul;Lee, Eun-Young;Gu, Eun-Hye;Choi, Il;Choi, Byung-Jae;Ryu, Gang-Soo;Park, Kil-Houm
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.2
    • /
    • pp.260-267
    • /
    • 2011
  • This paper proposes a method for segmenting objects from the background in IR(Infrared) images based on GrabCut algorithm. The GrabCut algorithm needs the window encompassing the interesting known object. This procedure is processed by user. However, to apply it for object recognition problems in image sequences. the location of window should be determined automatically. For this, we adopted the Otsu' algorithm for segmenting the interesting but unknown objects in an image coarsely. After applying the Otsu' algorithm, the window is located automatically by blob analysis. The GrabCut algorithm needs the probability distributions of both the candidate object region and the background region surrounding closely the object for estimating the Gaussian mixture models(GMMs) of the object and the background. The probability distribution of the background is computed from the background window, which has the same number of pixels within the candidate object region. Experiments for various IR images show that the proposed method is proper to segment out the interesting object in IR image sequences. To evaluate performance of proposed segmentation method, we compare other segmentation methods.

Extensions of LDA by PCA Mixture Model and Class-wise Features (PCA 혼합 모형과 클래스 기반 특징에 의한 LDA의 확장)

  • Kim Hyun-Chul;Kim Daijin;Bang Sung-Yang
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.8
    • /
    • pp.781-788
    • /
    • 2005
  • LDA (Linear Discriminant Analysis) is a data discrimination technique that seeks transformation to maximize the ratio of the between-class scatter and the within-class scatter While it has been successfully applied to several applications, it has two limitations, both concerning the underfitting problem. First, it fails to discriminate data with complex distributions since all data in each class are assumed to be distributed in the Gaussian manner; and second, it can lose class-wise information, since it produces only one transformation over the entire range of classes. We propose three extensions of LDA to overcome the above problems. The first extension overcomes the first problem by modeling the within-class scatter using a PCA mixture model that can represent more complex distribution. The second extension overcomes the second problem by taking different transformation for each class in order to provide class-wise features. The third extension combines these two modifications by representing each class in terms of the PCA mixture model and taking different transformation for each mixture component. It is shown that all our proposed extensions of LDA outperform LDA concerning classification errors for handwritten digit recognition and alphabet recognition.

A Speaker Pruning Method for Reducing Calculation Costs of Speaker Identification System (화자식별 시스템의 계산량 감소를 위한 화자 프루닝 방법)

  • 김민정;오세진;정호열;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.457-462
    • /
    • 2003
  • In this paper, we propose a speaker pruning method for real-time processing and improving performance of speaker identification system based on GMM(Gaussian Mixture Model). Conventional speaker identification methods, such as ML (Maximum Likelihood), WMR(weighting Model Rank), and MWMR(Modified WMR) we that frame likelihoods are calculated using the whole frames of each input speech and all of the speaker models and then a speaker having the biggest accumulated likelihood is selected. However, in these methods, calculation cost and processing time become larger as the increase of the number of input frames and speakers. To solve this problem in the proposed method, only a part of speaker models that have higher likelihood are selected using only a part of input frames, and identified speaker is decided from evaluating the selected speaker models. In this method, fm can be applied for improving the identification performance in speaker identification even the number of speakers is changed. In several experiments, the proposed method showed a reduction of 65% on calculation cost and an increase of 2% on identification rate than conventional methods. These results means that the proposed method can be applied effectively for a real-time processing and for improvement of performance in speaker identification.

A Study on the Optimization of State Tying Acoustic Models using Mixture Gaussian Clustering (혼합 가우시안 군집화를 이용한 상태공유 음향모델 최적화)

  • Ann, Tae-Ock
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.167-176
    • /
    • 2005
  • This paper describes how the state tying model based on the decision tree which is one of Acoustic models used for speech recognition optimizes the model by reducing the number of mixture Gaussians of the output probability distribution. The state tying modeling uses a finite set of questions which is possible to include the phonological knowledge and the likelihood based decision criteria. And the recognition rate can be improved by increasing the number of mixture Gaussians of the output probability distribution. In this paper, we'll reduce the number of mixture Gaussians at the highest point of recognition rate by clustering the Gaussians. Bhattacharyya and Euclidean method will be used for the distance measure needed when clustering. And after calculating the mean and variance between the pair of lowest distance, the new Gaussians are created. The parameters for the new Gaussians are derived from the parameters of the Gaussians from which it is born. Experiments have been performed using the STOCKNAME (1,680) databases. And the test results show that the proposed method using Bhattacharyya distance measure maintains their recognition rate at $97.2\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. And the method using Euclidean distance measure shows that it maintains the recognition rate at $96.9\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. Then the methods can optimize the state tying model.

On the Use of Various Resolution Filterbanks for Speaker Identification

  • Lee, Bong-Jin;Kang, Hong-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.80-86
    • /
    • 2007
  • In this paper, we utilize generalized warped filterbanks to improve the performance of speaker recognition systems. At first, the performance of speaker identification systems is analyzed by varying the type of warped filterbanks. Based on the results that the error pattern of recognition system is different depending on the type of filterbank used, we combine the likelihood values of the statistical models that consist of the features extracting from multiple warped filterbanks. Simulation results with TIMIT and NTIMIT database verify that the proposed system shows relative improvement of identification rate by 31.47% and 15.14% comparing it to the conventional system.

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Bae Keunsung
    • MALSORI
    • /
    • no.52
    • /
    • pp.111-120
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5 kbytes for program code. Maximum required time of 29.2 ms for processing a frame of 32 ms of speech validates real-time operation of the implemented system.

  • PDF

Network Intrusion Detection System Using Gaussian Mixture Models (가우시안 혼합 모델을 이용한 네트워크 침입 탐지 시스템)

  • Park Myung-Aun;Kim Dong-Kook;Noh Bong-Nam
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.130-132
    • /
    • 2005
  • 초고속 네트워크의 폭발적인 확산과 함께 네트워크 침입 사례 또한 증가하고 있다. 이를 검출하기 위한 방안으로 침입 탐지 시스템에 대한 관심과 연구 또한 증가하고 있다. 네트워크 침입을 탐지위한 방법으로 기존의 알려진 공격을 찾는 오용 탐지와 비정상적인 행위를 탐지하는 방법이 존재한다. 본 논문에서는 이를 혼합한 하이브리드 형태의 새로운 침입 탐지 시스템을 제안한다. 기존의 혼합된 방식과는 다르게 네트워크 데이터의 모델링과 탐지를 위해 가우시안 혼합 모델을 사용한다. 가우시안 혼합 모델에 기반한 침입 탐지 시스템의 성능을 평가하기 위해 DARPA'99 데이터에 적용하여 실험하였다. 실험 결과 정상과 공격은 확연히 구분되는 결과를 나타내었으며, 공격 간의 분류도 상당 수 가능하였다.

  • PDF

Forensic Automatic Speaker Identification System for Korean Speakers (과학수사를 위한 한국인 음성 특화 자동화자식별시스템)

  • Kim, Kyung-Wha;So, Byung-Min;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.95-101
    • /
    • 2012
  • In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.

Speaker Identification Using Greedy Kernel PCA (Greedy Kernel PCA를 이용한 화자식별)

  • Kim, Min-Seok;Yang, Il-Ho;Yu, Ha-Jin
    • MALSORI
    • /
    • no.66
    • /
    • pp.105-116
    • /
    • 2008
  • In this research, we propose a speaker identification system using a kernel method which is expected to model the non-linearity of speech features well. We have been using principal component analysis (PCA) successfully, and extended to kernel PCA, which is used for many pattern recognition tasks such as face recognition. However, we cannot use kernel PCA for speaker identification directly because the storage required for the kernel matrix grows quadratically, and the computational cost grows linearly (computing eigenvector of $l{\times}l$ matrix) with the number of training vectors I. Therefore, we use greedy kernel PCA which can approximate kernel PCA with small representation error. In the experiments, we compare the accuracy of the greedy kernel PCA with the baseline Gaussian mixture models using MFCCs and PCA. As the results with limited enrollment data show, the greedy kernel PCA outperforms conventional methods.

  • PDF