• 제목/요약/키워드: gaussian mixture models

검색결과 99건 처리시간 0.021초

결정적 어닐링 EM 알고리즘을 이용한 칼라 영상의 분할 (Segmentation of Color Image Using the Deterministic Anneanling EM Algorithm)

  • 박종현;박순영;조완현
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1999년도 추계종합학술대회 논문집
    • /
    • pp.569-572
    • /
    • 1999
  • In this paper we present a color image segmentation algorithm based on statistical models. A novel deterministic annealing Expectation Maximization(EM) formula is derived to estimate the parameters of the Gaussian Mixture Model(GMM) which represents the multi-colored objects statistically. The experimental results show that the proposed deterministic annealing EM is a global optimal solution for the ML parameter estimation and the image field is segmented efficiently by using the parameter estimates.

  • PDF

SVM을 이용한 자동 음소분할에 관한 연구 (Research about auto-segmentation via SVM)

  • 권호민;한학용;김창근;허강인
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2220-2223
    • /
    • 2003
  • In this paper we used Support Vector Machines(SVMs) recently proposed as the loaming method, one of Artificial Neural Network, to divide continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. From experiment we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).

  • PDF

A Fast EM Algorithm for Gaussian Mixtures

  • Jung, Hye-Kyung;Seo, Byung-Tae
    • Communications for Statistical Applications and Methods
    • /
    • 제19권1호
    • /
    • pp.157-168
    • /
    • 2012
  • The EM algorithm is the most important tool to obtain the maximum likelihood estimator in finite mixture models due to its stability and simplicity. However, its convergence rate is often slow because the conventional EM algorithm is based on a large missing data space. Several techniques have been proposed in the literature to reduce the missing data space. In this paper, we review existing methods and propose a new EM algorithm for Gaussian mixtures, which reduces the missing data space while preserving the stability of the conventional EM algorithm. The performance of the proposed method is evaluated with other existing methods via simulation studies.

시변 잡음에 대처하기 위한 다중 모델을 이용한 PCMM 기반 특징 보상 기법 (PCMM-Based Feature Compensation Method Using Multiple Model to Cope with Time-Varying Noise)

  • 김우일;고한석
    • 한국음향학회지
    • /
    • 제23권6호
    • /
    • pp.473-480
    • /
    • 2004
  • 본 논문에서는 잡음 환경에서 강인한 음성 인식을 위하여 음성 모델을 기반으로 하는 효과적인 특징 보상 기법을 제안한다. 제안하는 특징 보상 기법은 병렬 결합된 혼합 모델 (PCMM)을 기반으로 한다. 기존의 PCMM 기반의 기법은 시간에 따라 변하는 잡음 환경을 반영하기 위하여 매 음성 입력마다 복잡한 과정의 혼합 모델 결합이 필요하다. 제안하는 기법에서는 다중의 혼합 모델을 보간하는 방법을 채용함으로써 시간에 따라 변하는 배경 잡음에 대응할 수 있다. 보다 신뢰성 있는 혼합 모델 생성을 위하여 데이터 유도 기반의 방법을 도입하고, 실시간 처리를 위하여 프레임에 동기화된 환경 사후 확률 예측 과정을 제안한다. 다중 모델로 인한 연산량 증가를 막기 위하여 혼합 모델을 공유하는 기법을 제안한다. 가우시안 혼합 모델 사이에 통계학적으로 유사한 요소들을 선택하여 공유에 필요한 공통 모델을 생성한다. Aurora 2.0 데이터베이스와 실제 자동차 주행 환경에서 수집된 음성 데이터베이스에 대한 성능 평가를 실시한다. 실험 결과로부터 제안한 기법이 모의 환경과 실제 잡음 환경에서 강인한 음성 인식 성능을 가져오고 연산량 감소에 효과적임을 확인한다.

프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현 (Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization)

  • 김민정;석수영;김광수;정현열
    • 융합신호처리학회논문지
    • /
    • 제3권1호
    • /
    • pp.8-14
    • /
    • 2002
  • 본 논문에서는 Gaussian mixture model을 이용한 실시간 문맥독립화자식별시스템을 구현하여 인식실험을 수행하였으며, 인식시스템의 성능을 향상시키기 위하여 화자검증시스템에서 좋은 결과를 보인 유사도 정규화(Likelihood normalization)방법을 적용하여 인식실험을 하였다. 시스템은 크게 전처리단과 화자모델생성단, 화자식별단으로 나누어진다. 전처리단에서는 화자의 발성변화를 고려하여 CMN(Cepstral mean normalization)과 Silence removal 방법을 적용하였다. 화자모델생성단에서는, 화자발성의 음향학적 특징을 잘 표현할 수 있는 GMM(Gaussian mixture model)을 이용하여 화자모델을 작성하였으며, GMM의 파라미터를 최적화하기 위하여 MLE(Maximum likelihood estimation)방법을 사용하였다. 화자식별단에서는 학습된 데이터와 테스트용 데이터로부터 ML(Maximum likelihood)을 이용하여 유사도를 계산하였으며, 이 과정에서 유사도 정규화를 적용한 경우에는 프레임단위로 유사도를 계산하게 된다. 계산된 유사도는 스코어(S$_{C}$)로 표현하였고, 가장 높은 스코어를 가지는 화자가 인식화자로 결정된다. 화자인식에서 발성의 종류로는 문맥독립 문장을 사용하였다. 인식실험을 위해서는 ETRI445 DB와 KLE452 DB를 사용하였으며, 특징파라미터로서는 켑스트럼계수 및 회귀계수값만을 사용하였다. 인식실험에서는 등록화자의 수를 달리하여 일반적인 화자식별방법과 프레임단위유사도정규화방법으로 각각 인식실험을 하였다. 인식실험결과, 프레임단위유사도정규화방법이 인식화자수가 많아지는 경우에 일반적인 방법보다 향상된 인식률을 얻을 수 있었다.

  • PDF

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser;Almasganj, Farshad
    • ETRI Journal
    • /
    • 제35권1호
    • /
    • pp.100-108
    • /
    • 2013
  • In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

얼굴인증 방법들의 조명변화에 대한 견인성 연구 (Study On the Robustness Of Four Different Face Authentication Methods Under Illumination Changes)

  • 고대영;천영하;김진영;이주헌
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2036-2039
    • /
    • 2003
  • This paper focuses on the study of the robustness of face authentication methods under illumination changes. Four different face authentication methods are tried. These methods are as follows; Principal Component Analysis, Gaussian Mixture Models, 1-Dimensional Hidden Markov Models, 2-Dimensional Hidden Markov Models. Experiment results involving an artificial illumination change to face images are compared with each others. Face feature vector extraction method based on the 2-Dimensional Discrete Cosine Transform is used. Experiments to evaluate the above four different face authentication methods are carried out on the Olivetti Research Laboratory(ORL) face database. For the pseudo 2D HMM, the best EER (Equal Error Rate) performance is observed.

  • PDF

가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가 (Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting)

  • 김형순;김영국;신영욱
    • 음성과학
    • /
    • 제10권3호
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

화자인증 시스템에서 선정 방법에 관한 연구 (A Study on Background Speaker Selection Method in Speaker Verification System)

  • 최홍섭
    • 음성과학
    • /
    • 제9권2호
    • /
    • pp.135-146
    • /
    • 2002
  • Generally a speaker verification system improves its system recognition ratio by regularizing log likelihood ratio, using a speaker model and its background speaker model that are required to be verified. The speaker-based cohort method is one of the methods that are widely used for selecting background speaker model. Recently, Gaussian-based cohort model has been suggested as a virtually synthesized cohort model, and unlike a speaker-based model, this is the method that chooses only the probability distributions close to basic speaker's probability distribution among the several neighboring speakers' probability distributions and thereby synthesizes a new virtual speaker model. It shows more excellent results than the existing speaker-based method. This study compared the existing speaker-based background speaker models and virtual speaker models and then constructed new virtual background speaker model groups which combined them in a certain ratio. For this, this study constructed a speaker verification system that uses GMM (Gaussin Mixture Model), and found that the suggested method of selecting virtual background speaker model shows more improved performance.

  • PDF

Anomalous Event Detection in Traffic Video Based on Sequential Temporal Patterns of Spatial Interval Events

  • Ashok Kumar, P.M.;Vaidehi, V.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권1호
    • /
    • pp.169-189
    • /
    • 2015
  • Detection of anomalous events from video streams is a challenging problem in many video surveillance applications. One such application that has received significant attention from the computer vision community is traffic video surveillance. In this paper, a Lossy Count based Sequential Temporal Pattern mining approach (LC-STP) is proposed for detecting spatio-temporal abnormal events (such as a traffic violation at junction) from sequences of video streams. The proposed approach relies mainly on spatial abstractions of each object, mining frequent temporal patterns in a sequence of video frames to form a regular temporal pattern. In order to detect each object in every frame, the input video is first pre-processed by applying Gaussian Mixture Models. After the detection of foreground objects, the tracking is carried out using block motion estimation by the three-step search method. The primitive events of the object are represented by assigning spatial and temporal symbols corresponding to their location and time information. These primitive events are analyzed to form a temporal pattern in a sequence of video frames, representing temporal relation between various object's primitive events. This is repeated for each window of sequences, and the support for temporal sequence is obtained based on LC-STP to discover regular patterns of normal events. Events deviating from these patterns are identified as anomalies. Unlike the traditional frequent item set mining methods, the proposed method generates maximal frequent patterns without candidate generation. Furthermore, experimental results show that the proposed method performs well and can detect video anomalies in real traffic video data.