• Title/Summary/Keyword: Gaussian Mixture models

Search Result 99, Processing Time 0.034 seconds

A Post-processing for Binary Mask Estimation Toward Improving Speech Intelligibility in Noise (잡음환경 음성명료도 향상을 위한 이진 마스크 추정 후처리 알고리즘)

  • Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.311-318
    • /
    • 2013
  • This paper deals with a noise reduction algorithm which uses the binary masking in the time-frequency domain. To improve speech intelligibility in noise, noise-masked speech is decomposed into time-frequency units and mask "0" is assigned to masker-dominant region removing time-frequency units where noise is dominant compared to speech. In the previous research, Gaussian mixture models were used to classify the speech-dominant region and noise-dominant region which correspond to mask "1" and mask "0", respectively. In each frequency band, data were collected and trained to build the Gaussian mixture models and detection procedure is performed to the test data where each time-frequency unit belongs to speech-dominant region or noise-dominant region. In this paper, we consider the correlation of masks in the frequency domain and propose a post-processing method which exploits the Viterbi algorithm.

Segmentation of Color Image using the Deterministic Annealing EM Algorithm (결정적 어닐링 EM 알고리즘을 이요한 칼라 영상의 분할)

  • Cho, Wan-Hyun;Park, Jong-Hyun;Park, Soon-Young
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.324-333
    • /
    • 2001
  • In this paper we present a novel color image segmentation algorithm based on a Gaussian Mixture Model(GMM). It is introduced a Deterministic Annealing Expectation Maximization(DAEM) algorithm which is developed using the principle of maximum entropy to overcome the local maxima problem associated with the standard EM algorithm. In our approach, the GMM is used to represent the multi-colored objects statistically and its parameters are estimated by DAEM algorithm. We also develop the automatic determination method of the number of components in Gaussian mixtures models. The segmentation of image is based on the maximum posterior probability distribution which is calculated by using the GMM. The experimental results show that the proposed DAEM can estimate the parameters more accurately than the standard EM and the determination method of the number of mixture models is very efficient. When tested on two natural images, the proposed algorithm performs much better than the traditional algorithm in segmenting the image fields.

  • PDF

Improvement of Semicontinuous Hiden Markov Models and One-Pass Algorithm for Recognition of Keywords in Korean Continuous Speech (한국어 연속음성중 키워드 인식을 위한 반연속 은닉 마코브 모델과 One-Pass 알고리즘의 개선방안)

  • 최관선
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.358-363
    • /
    • 1994
  • This paper presents the improvement of the SCHMM using discrete VQ and One-Pass algorithm for keywords recognition in Korean continuous speech. The SCHMM using discrete VQ is a simple model that is composed of a variable mixture gaussian probability density function with dynamic mixture number. One-Pass algorithm is improved such that recognition rates are enhanced by fathoming any undesirable semisyllable with the low likelihood and the high duration penalty, and computation time is reduced by testing only the frame which is dissimilar to the previously testd frame. In recognition experiments for speaker-dependent case, the improved One-Pass algorithm has shown recognition rates as high as 99.7% and has reduced compution time by about 30% compared with the currently abailable one-pass algorithm.

  • PDF

Performance Improvement of a Text-Independent Speaker Identification System Using MCE Training (MCE 학습 알고리즘을 이용한 문장독립형 화자식별의 성능 개선)

  • Kim Tae-Jin;Choi Jae-Gil;Kwon Chul-Hong
    • MALSORI
    • /
    • no.57
    • /
    • pp.165-174
    • /
    • 2006
  • In this paper we use a training algorithm, MCE (Minimum Classification Error), to improve the performance of a text-independent speaker identification system. The MCE training scheme takes account of possible competing speaker hypotheses and tries to reduce the probability of incorrect hypotheses. Experiments performed on a small set speaker identification task show that the discriminant training method using MCE can reduce identification errors by up to 54% over a baseline system trained using Bayesian adaptation to derive GMM (Gaussian Mixture Models) speaker models from a UBM (Universal Background Model).

  • PDF

A study on analysis of abdominal EMG using Hmm-Gmm algorithm (HMM-GMM 방식을 이용한 복부 근전도 분석에 관한 연구)

  • Gwon, Jang-U;Kim, Jeong-Ho;Kim, Hyeon-Seong;Yun, Dong-Eop;Choe, Heung-Ho
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2007.05a
    • /
    • pp.121-124
    • /
    • 2007
  • 최근 각종 질환의 원인이 되고 있는 비만은 심각한 사회문제로 대두되고 있으며, 이를 해결하기 위해 비만관리를 위한 측정 시스템의 필요성이 증가하고 있다. 본 논문은 비만관리를 위해 복부의 근전도 신호를 분석해서 언제 어디서든 본인의 건강상태를 체크하여 적절한 의료 서비스를 받을 수 있는 측정 시스템에 관한 연구이다. 복부 근전도 신호 분석을 위해서 에너지 검출, 신호 특징 추출, 상태 분류 및 인식 등을 위한 알고리즘을 제안한다. 이 신호 분석 알고리즘을 측정 시스템에 적용하여 복부의 비만도 및 복부의 근력을 평가하여 건강상태에 대한 적절한 평가를 제공하는 시스템을 제안한다.

  • PDF

Dimension-Reduced Audio Spectrum Projection Features for Classifying Video Sound Clips

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.3E
    • /
    • pp.89-94
    • /
    • 2006
  • For audio indexing and targeted search of specific audio or corresponding visual contents, the MPEG-7 standard has adopted a sound classification framework, in which dimension-reduced Audio Spectrum Projection (ASP) features are used to train continuous hidden Markov models (HMMs) for classification of various sounds. The MPEG-7 employs Principal Component Analysis (PCA) or Independent Component Analysis (ICA) for the dimensional reduction. Other well-established techniques include Non-negative Matrix Factorization (NMF), Linear Discriminant Analysis (LDA) and Discrete Cosine Transformation (DCT). In this paper we compare the performance of different dimensional reduction methods with Gaussian mixture models (GMMs) and HMMs in the classifying video sound clips.

Speaker Identification in Small Training Data Environment using MLLR Adaptation Method (MLLR 화자적응 기법을 이용한 적은 학습자료 환경의 화자식별)

  • Kim, Se-hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.159-162
    • /
    • 2005
  • Identification is the process automatically identify who is speaking on the basis of information obtained from speech waves. In training phase, each speaker models are trained using each speaker's speech data. GMMs (Gaussian Mixture Models), which have been successfully applied to speaker modeling in text-independent speaker identification, are not efficient in insufficient training data environment. This paper proposes speaker modeling method using MLLR (Maximum Likelihood Linear Regression) method which is used for speaker adaptation in speech recognition. We make SD-like model using MLLR adaptation method instead of speaker dependent model (SD). Proposed system outperforms the GMMs in small training data environment.

  • PDF

Study On The Robustness Of Face Authentication Methods Under illumination Changes (얼굴인증 방법들의 조명변화에 대한 견인성 비교 연구)

  • Ko Dae-Young;Kim Jin-Young;Na Seung-You
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.9-16
    • /
    • 2005
  • This paper focuses on the study of the face authentication system and the robustness of fact authentication methods under illumination changes. Four different face authentication methods are tried. These methods are as fellows; PCA(Principal Component Analysis), GMM(Gaussian Mixture Modeis), 1D HMM(1 Dimensional Hidden Markov Models), Pseudo 2D HMM(Pseudo 2 Dimensional Hidden Markov Models). Experiment results involving an artificial illumination change to fate images are compared with each other. Face feature vector extraction based on the 2D DCT(2 Dimensional Discrete Cosine Transform) if used. Experiments to evaluate the above four different fate authentication methods are carried out on the ORL(Olivetti Research Laboratory) face database. Experiment results show the EER(Equal Error Rate) performance degrade in ail occasions for the varying ${\delta}$. For the non illumination changes, Pseudo 2D HMM is $2.54{\%}$,1D HMM is $3.18{\%}$, PCA is $11.7{\%}$, GMM is $13.38{\%}$. The 1D HMM have the bettor performance than PCA where there is no illumination changes. But the 1D HMM have worse performance than PCA where there is large illumination changes(${\delta}{\geq}40$). For the Pseudo 2D HMM, The best EER performance is observed regardless of the illumination changes.

Voice Activity Detection in Noisy Environment based on Statistical Nonlinear Dimension Reduction Techniques (통계적 비선형 차원축소기법에 기반한 잡음 환경에서의 음성구간검출)

  • Han Hag-Yong;Lee Kwang-Seok;Go Si-Yong;Hur Kang-In
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.5
    • /
    • pp.986-994
    • /
    • 2005
  • This Paper proposes the likelihood-based nonlinear dimension reduction method of the speech feature parameters in order to construct the voice activity detecter adaptable in noisy environment. The proposed method uses the nonlinear values of the Gaussian probability density function with the new parameters for the speec/nonspeech class. We adapted Likelihood Ratio Test to find speech part and compared its performance with that of Linear Discriminant Analysis technique. In experiments we found that the proposed method has the similar results to that of Gaussian Mixture Models.

Statistical Inference in Non-Identifiable and Singular Statistical Models

  • Amari, Shun-ichi;Amari, Shun-ichi;Tomoko Ozeki
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.2
    • /
    • pp.179-192
    • /
    • 2001
  • When a statistical model has a hierarchical structure such as multilayer perceptrons in neural networks or Gaussian mixture density representation, the model includes distribution with unidentifiable parameters when the structure becomes redundant. Since the exact structure is unknown, we need to carry out statistical estimation or learning of parameters in such a model. From the geometrical point of view, distributions specified by unidentifiable parameters become a singular point in the parameter space. The problem has been remarked in many statistical models, and strange behaviors of the likelihood ratio statistics, when the null hypothesis is at a singular point, have been analyzed so far. The present paper studies asymptotic behaviors of the maximum likelihood estimator and the Bayesian predictive estimator, by using a simple cone model, and show that they are completely different from regular statistical models where the Cramer-Rao paradigm holds. At singularities, the Fisher information metric degenerates, implying that the cramer-Rao paradigm does no more hold, and that he classical model selection theory such as AIC and MDL cannot be applied. This paper is a first step to establish a new theory for analyzing the accuracy of estimation or learning at around singularities.

  • PDF