• Title/Summary/Keyword: Speaker Adaptation

Search Result 122, Processing Time 0.023 seconds

On Codebook Design to Improve Speaker Adaptation (음성 인식 시스템의 화자 적응 성능 향상을 위한 코드북 설계)

  • Yang, Tae-Young;Shin, Won-Ho;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.5-11
    • /
    • 1996
  • The purpose of this paper is to propose a method improving the performance of a semi-continuous hidden Markov model(SCHMM) speaker adaptation system which uses Bayesian Parameter reestimation approach. The performance of Bayesian speaker adaptation could be degraded in case that the features of a new speaker are severely different from those of a reference codebook. The excessive codewords of the reference codebook still remain after adaptation proess. which cause confusion in recognition process. To solve such problems, the proposed method uses formant information which is extracted from the cepstral coefficients of the reference codebook and adaptation data. The reference codebook is adapted to represent the formant distribution of a new speaker and it is used for Bayesian speaker adaptation as an initial codebook. The proposed method provides accurate correspondence between reference codebook and adaptation data. It was observed that the excessive codewords were not selected during recognition process. The experimental results showed that the proposed method improved the recognition performance.

  • PDF

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

Speaker Adaptation in HMM-based Korean Isoklated Word Recognition (한국어 격리단어 인식 시스템에서 HMM 파라미터의 화자 적응)

  • 오광철;이황수;은종관
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.40 no.4
    • /
    • pp.351-359
    • /
    • 1991
  • This paper describes performances of speaker adaptation using a probabilistic spectral mapping matrix in hidden-Markov model(HMM) -based Korean isolated word recognition. Speaker adaptation based on probabilistic spectral mapping uses a well-trained prototype HMM's and is carried out by Viterbi, dynamic time warping, and forward-backward algorithms. Among these algorithms, the best performance is obtained by using the Viterbi approach together with codebook adaptation whose improvement for isolated word recognition accuracy is 42.6-68.8 %. Also, the selection of the initial values of the matrix and the normalization in computing the matrix affects the recognition accuracy.

Rapid Speaker Adaptation Based on MAPLR with Adaptive Hybrid Priors Estimated from Reference Speakers (참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응)

  • Song, Young-Rok;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.315-323
    • /
    • 2011
  • This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.

Performance Improvement of Fast Speaker Adaptation Based on Dimensional Eigenvoice and Adaptation Mode Selection (차원별 Eigenvoice와 화자적응 모드 선택에 기반한 고속화자적응 성능 향상)

  • 송화전;이윤근;김형순
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.48-53
    • /
    • 2003
  • Eigenvoice method is known to be adequate for fast speaker adaptation, but it hardly shows additional improvement with increased amount of adaptation data. In this paper, to deal with this problem, we propose a modified method estimating the weights of eigenvoices in each feature vector dimension. We also propose an adaptation mode selection scheme that one method with higher performance among several adaptation methods is selected according to the amount of adaptation data. We used POW DB to construct the speaker independent model and eigenvoices, and utterances(ranging from 1 to 50) from PBW 452 DB and the remaining 400 utterances were used for adaptation and evaluation, respectively. With the increased amount of adaptation data, proposed dimensional eigenvoice method showed higher performance than both conventional eigenvoice method and MLLR. Up to 26% of word error rate was reduced by the adaptation mode selection between eigenvoice and dimensional eigenvoice methods in comparison with conventional eigenvoice method.

Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

  • Kim, Donghyun;Kim, Sanghun
    • ETRI Journal
    • /
    • v.41 no.1
    • /
    • pp.109-116
    • /
    • 2019
  • This paper explores new techniques that are based on a hidden-layer linear transformation for fast speaker adaptation used in deep neural networks (DNNs). Conventional methods using affine transformations are ineffective because they require a relatively large number of parameters to perform. Meanwhile, methods that employ singular-value decomposition (SVD) are utilized because they are effective at reducing adaptive parameters. However, a matrix decomposition is computationally expensive when using online services. We propose the use of an extended diagonal linear transformation method to minimize adaptation parameters without SVD to increase the performance level for tasks that require smaller degrees of adaptation. In Korean large vocabulary continuous speech recognition (LVCSR) tasks, the proposed method shows significant improvements with error-reduction rates of 8.4% and 17.1% in five and 50 conversational sentence adaptations, respectively. Compared with the adaptation methods using SVD, there is an increased recognition performance with fewer parameters.

Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using Simulated Speech Model (모의 음성 모델을 이용한 효과적인 구개인두부전증 환자 음성 인식)

  • Sung, Mee Young;Kwon, Tack-Kyun;Sung, Myung-Whun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.5
    • /
    • pp.1243-1250
    • /
    • 2015
  • This paper presents an effective recognition method of VPI patient's speech for a VPI speech reconstruction system. Speaker adaptation technique is employed to improve VPI speech recognition. This paper proposes to use simulated speech for generating an initial model for speaker adaptation, in order to effectively utilize the small size of VPI speech for model adaptation. We obtain 83.60% in average word accuracy by applying MLLR for speaker adaptation. The proposed speaker adaptation method using simulated speech model brings 6.38% improvement in average accuracy. The experimental results demonstrate that the proposed speaker adaptation method is highly effective for developing recognition system of VPI speech which is not suitable for constructing large-size speech database.

Performance Improvement of Rapid Speaker Adaptation Using Bias Compensation and Mean of Dimensional Eigenvoice Models (바이어스 보상과 차원별 Eigenvoice 모델 평균을 이용한 고속화자적응의 성능향상)

  • 박종세;김형순;송화전
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.383-389
    • /
    • 2004
  • In this paper. we propose the bias compensation methods and the eigenvoice method using the mean of dimensional eigenvoice to improve the performance of rapid speaker adaptation based on eigenvoice under mismatch between training and test environment. Experimental results for vocabulary-independent word recognition task (using PBW 452 DB) show that the proposed methods yield improvements for small adaptation data. We obtained about 22∼30% relative improvement by the bias compensation methods as amount of adaptation data varied from 1 to 50, and obtained 41% relative improvement in error rate by the eigenvoice method using the mean of dimensional eigenvoice with only single adaptation word.

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

  • Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.566-571
    • /
    • 2009
  • This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.

The Comparison of Characteristics in various Speaker Adaptation Methods (여러 화자 적응 방법들의 특성 비교)

  • 황영수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.339-342
    • /
    • 1998
  • In this paper, we proposed various speaker adaptation methods and studied the performance of these methods. Methods which were studied in this paper are MAPE(Maximum A Posteriori Probability Estimation), ARTMAP. In order to evaluate the performance of these methods, we used Korean isolated digits as the experimental data, the hybrid speaker adaptation method, which unfied MAPE, linear spectral estimating and outpur probability of SCHMM, showed the better recognition result than those which performed other methods. And the method using ARTMAP showed the similar result to above hybrid method.

  • PDF