• Title/Summary/Keyword: Eigenvoice

Search Result 13, Processing Time 0.022 seconds

Performance Improvement of Rapid Speaker Adaptation Using Bias Compensation and Mean of Dimensional Eigenvoice Models (바이어스 보상과 차원별 Eigenvoice 모델 평균을 이용한 고속화자적응의 성능향상)

  • 박종세;김형순;송화전
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.383-389
    • /
    • 2004
  • In this paper. we propose the bias compensation methods and the eigenvoice method using the mean of dimensional eigenvoice to improve the performance of rapid speaker adaptation based on eigenvoice under mismatch between training and test environment. Experimental results for vocabulary-independent word recognition task (using PBW 452 DB) show that the proposed methods yield improvements for small adaptation data. We obtained about 22∼30% relative improvement by the bias compensation methods as amount of adaptation data varied from 1 to 50, and obtained 41% relative improvement in error rate by the eigenvoice method using the mean of dimensional eigenvoice with only single adaptation word.

Performance Improvement of Fast Speaker Adaptation Based on Dimensional Eigenvoice and Adaptation Mode Selection (차원별 Eigenvoice와 화자적응 모드 선택에 기반한 고속화자적응 성능 향상)

  • 송화전;이윤근;김형순
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.48-53
    • /
    • 2003
  • Eigenvoice method is known to be adequate for fast speaker adaptation, but it hardly shows additional improvement with increased amount of adaptation data. In this paper, to deal with this problem, we propose a modified method estimating the weights of eigenvoices in each feature vector dimension. We also propose an adaptation mode selection scheme that one method with higher performance among several adaptation methods is selected according to the amount of adaptation data. We used POW DB to construct the speaker independent model and eigenvoices, and utterances(ranging from 1 to 50) from PBW 452 DB and the remaining 400 utterances were used for adaptation and evaluation, respectively. With the increased amount of adaptation data, proposed dimensional eigenvoice method showed higher performance than both conventional eigenvoice method and MLLR. Up to 26% of word error rate was reduced by the adaptation mode selection between eigenvoice and dimensional eigenvoice methods in comparison with conventional eigenvoice method.

Self-Adaptation Algorithm Based on Maximum A Posteriori Eigenvoice for Korean Connected Digit Recognition (한국어 연결 숫자음 인식을 일한 최대 사후 Eigenvoice에 근거한 자기적응 기법)

  • Kim Dong Kook;Jeon Hyung Bae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.590-596
    • /
    • 2004
  • This paper Presents a new self-adaptation algorithm based on maximum a posteriori (MAP) eigenvoice for Korean connected digit recognition. The proposed MAP eigenvoice is developed by introducing a probability density model for the eigenvoice coefficients. The Proposed approach provides a unified framework that incorporates the Prior model into the conventional eigenvoice estimation. In self-adaptation system we use only one adaptation utterance that will be recognized, we use MAP eigenvoice that is most robust adaptation. In series of self-adaptation experiments on the Korean connected digit recognition task. we demonstrate that the performance of the proposed approach is better than that of the conventional eigenvoice algorithm for a small amount of adaptation data.

Rapid Speaker Adaptation Based on Eigenvoice Using Weight Distribution Characteristics (가중치 분포 특성을 이용한 Eigenvoice 기반 고속화자적응)

  • 박종세;김형순;송화전
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.403-407
    • /
    • 2003
  • Recently, eigenvoice approach has been widely used for rapid speaker adaptation. However, even in the eigenvoice approach, Performance improvement using very small amount of adaptation data is relatively small in comparison with that using somewhat large adaptation data because the reliable estimation of weights of eigenvoice is difficult. In this paper, we propose a rapid speaker adaptation method based on eigenvoice using the weight distribution characteristics to improve the performance on a small adaptation data. In the Experimental results on vocabulary-independent word recognition task (using PBW 452 database), the weight threshold method alleviates the problem of relatively low performance for a tiny small adaptation data. When single adaptation word is used, word error rate is reduced about 9-18% by the weight threshold method.

Fast Speaker Adaptation Using Sub-Stream Based Eigenvoice (Sub-Stream 기반의 Eigenvoice를 이용한 고속 화자적응)

  • Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
    • MALSORI
    • /
    • v.55
    • /
    • pp.93-102
    • /
    • 2005
  • In this paper, sub-stream based eigenvoice method is proposed to overcome the weak points of conventional eigenvoice and dimensional eigenvoice. In the proposed method, sub-streams are automatically constructed by the statistical clustering analysis that uses the correlation information between dimensions. To obtain the reliable distance matrix from covariance matrix for dividing into optimal sub-streams, MAP adaptation technique is employed to the covariance matrix of training data and the sample covariance of adaptation data. According to our experiments, the proposed method shows $41\%$ error rate reduction when the number of adaptation data is 50.

  • PDF

Efficient Rapid Speaker Adaptation Using Merging Eigenvoices (Eigenvoice 병합을 이용한 효율적인 고속 화자 적응)

  • Choi Dong-jin;Oh Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.115-118
    • /
    • 2004
  • 음성 인식 분야에서는 화자 적응을 통해 화자 독립 시스템의 성능을 화자 종속 시스템에 근접시키려는 여러 가지 노력이 시도되고 있다. 특히 30 초미만의 매우 적은 양의 적응 자료를 이용하는 고속 화자 적응에 대한 관심이 증가하고 있다. 고속 화자 적응에 적합한 eigenvoice 를 이용한 적응 방법은 eigenvoice 를 구성하기 위해 너무 많은 계산량과 메모리를 요구한다. 본 논문에서는 각각 따로 계산된 eigenvoice 들을 한 번에 구성한 eigenvoice 들과 거의 같은 정확도를 갖도록 병합하여 고속 화자 적응에 이용하는 방법을 제안한다. 이 방법을 이용하면 훈련 자료의 추가시 처음부터 새롭게 eigenvoice 를 구하는 대신 추가된 자료에 대한 eigenvoice 를 구하고 병합함으로써 계산량과 메모리양을 현저히 줄일 수 있다. 실험 결과, 메모리와 계산량은 추가되는 화자 종속 모델의 수에 따라 감소하며 성능 저하는 거의 없었다.

  • PDF

Speaker Segmentation System Using Eigenvoice-based Speaker Weight Distance Method (Eigenvoice 기반 화자가중치 거리측정 방식을 이용한 화자 분할 시스템)

  • Choi, Mu-Yeol;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.4
    • /
    • pp.266-272
    • /
    • 2012
  • Speaker segmentation is a process of automatically detecting the speaker boundary points in the audio data. Speaker segmentation methods are divided into two categories depending on whether they use a prior knowledge or not: One is the model-based segmentation and the other is the metric-based segmentation. In this paper, we introduce the eigenvoice-based speaker weight distance method and compare it with the representative metric-based methods. Also, we employ and compare the Euclidean and cosine similarity functions to calculate the distance between speaker weight vectors. And we verify that the speaker weight distance method is computationally very efficient compared with the method directly using the distance between the speaker adapted models constructed by the eigenvoice technique.

Eigenvoice Adaptation of Classification Model for Binary Mask Estimation (Eigenvoice를 이용한 이진 마스크 분류 모델 적응 방법)

  • Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.20 no.1
    • /
    • pp.164-170
    • /
    • 2015
  • This paper deals with the adaptation of classification model in the binary mask approach to suppress noise in the noisy environment. The binary mask estimation approach is known to improve speech intelligibility of noisy speech. However, the same type of noisy data for the test data should be included in the training data for building the classification model of binary mask estimation. The eigenvoice adaptation is applied to the noise-independent classification model and the adapted model is used as noise-dependent model. The results are reported in Hit rates and False alarm rates. The experimental results confirmed that the accuracy of classification is improved as the number of adaptation sentences increases.

Rapid Speaker Adaptation for Continuous Speech Recognition Using Merging Eigenvoices (Eigenvoice 병합을 이용한 연속 음성 인식 시스템의 고속 화자 적응)

  • Choi, Dong-Jin;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.53
    • /
    • pp.143-156
    • /
    • 2005
  • Speaker adaptation in eigenvoice space is a popular method for rapid speaker adaptation. To improve the performance of the method, the number of speaker dependent models should be increased and eigenvoices should be re-estimated. However, principal component analysis takes much time to find eigenvoices, especially in a continuous speech recognition system. This paper describes a method to reduce computation time to estimate eigenvoices only for supplementary speaker dependent models and to merge them with the used eigenvoices. Experiment results show that the computation time is reduced by 73.7% while the performance is almost the same in case that the number of speaker dependent models is the same as used ones.

  • PDF

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

  • Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.566-571
    • /
    • 2009
  • This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.