• Title/Summary/Keyword: Speaker Adaptation

Search Result 122, Processing Time 0.028 seconds

A Study on the Speaker Adaptation in CDHMM (CDHMM의 화자적응에 관한 연구)

  • Kim, Gwang-Tae
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.2
    • /
    • pp.116-127
    • /
    • 2002
  • A new approach to improve the speaker adaptation algorithm by means of the variable number of observation density functions for CDHMM speech recognizer has been proposed. The proposed method uses the observation density function with more than one mixture in each state to represent speech characteristics in detail. The number of mixtures in each state is determined by the number of frames and the determinant of the variance, respectively. The each MAP Parameter is extracted in every mixture determined by these two methods. In addition, the state segmentation method requiring speaker adaptation can segment the adapting speech more Precisely by using speaker-independent model trained from sufficient database as a priori knowledge. And the state duration distribution is used lot adapting the speech duration information owing to speaker's utterance habit and speed. The recognition rate of the proposed methods are significantly higher than that of the conventional method using one mixture in each state.

Hybrid Speaker Adaptation using Maximum-Likelihood Estimation (MLE를 이용한 하이브리드 화자 적응)

  • 표현아;김세현;오영환
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.268-270
    • /
    • 2002
  • 최근 음성 인식 시스템의 성능 향상을 위해 화자 적응 (speaker adaptation)에 대한 연구가 활발히 진행되고 있다. HMM 기반 인식 시스템의 모델 파라미터를 수정하는 화자 적응의 경우, MAP방법과 MLLR 방법에 대한 연구가 주류를 이루고 있다. 두 방법은 adaptation data의 양에 따라서 서로 다른 성능을 보인다. 본 논문에서는 기존 두 방법을 Maximum-likelihood Estimation(MLE)를 이용하여 화자 적응을 수행하는 방법을 제안한다. 제안한 방법을 KAIST 통신연구실에서 구축한 한국어 도시이름 500단어 인식 시스템에 적용하여 adaptation data의 양에 상관없이 항상 높은 성능을 나타냈으며, 기존의 방법에 대해서 최고 4.37%의 인식률 향상을 보였다.

  • PDF

Reduction of Dimension of HMM parameters in MLLR Framework for Speaker Adaptation (화자적응시스템을 위한 MLLR 알고리즘 연산량 감소)

  • Kim Ji Un;Jeong Jae Ho
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.123-126
    • /
    • 2003
  • We discuss how to reduce the number of inverse matrix and its dimensions requested in MLLR framework for speaker adaptation. To find a smaller set of variables with less redundancy, we employ PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible. The amount of additional computation when PCA or ICA is applied is as small as it can be disregarded. The dimension of HMM parameters is reduced to about 1/3 ~ 2/7 dimensions of SI(speaker independent) model parameter with which speech recognition system represents word recognition rate as much as ordinary MLLR framework. If dimension of SI model parameter is n, the amount of computation of inverse matrix in MLLR is proportioned to O($n^4$). So, compared with ordinary MLLR, the amount of total computation requested in speaker adaptation is reduced to about 1/80~1/150.

  • PDF

The Study on the Speaker Adaptation Using Speaker Characteristics of Phoneme (음소에 따른 화자특성을 이용한 화자적응방법에 관한 연구)

  • 채나영;황영수
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.6-9
    • /
    • 2003
  • In this paper, we studied on the difference of speaker adaptation according to the phoneme classification for Korean Speech recognition. In order to study of speech adaptation according to the weight of difference of phoneme as recognition unit, we used SCHMM as recognition system. And Speaker adaptation method used in this paper was MAPE(Maximum A Posteriori Probability Estimation), Linear Spectral Estimation. In order to evaluate the performance of these methods, we used 10 Korean isolated numbers as the experimental data. It is possible for the first and the second methods to be carried out unsupervised learning and used in on-line system. And the first method was shown performance improvement over the second method, and hybrid adaptation showed the better recognition results than those which performed each method. And the result of Speaker adaptation using the variable weight according to the phoneme had better than the result using fixed weight.

  • PDF

Safety Robust Speaker Recognition Against Utterance Variationsed (발성변화에 강인한 화자 인식에 관한 연구)

  • Lee Ki-Yong
    • Journal of Internet Computing and Services
    • /
    • v.5 no.2
    • /
    • pp.69-73
    • /
    • 2004
  • A speaker model In speaker recognition system is to be trained from a large data set gathered in multiple sessions. Large data set requires large amount of memory and computation, and moreover it's practically hard to make users utter the data inseveral sessions. Recently the incremental adaptation methods are proposed to cover the problems, However, the data set gathered from multiple sessions is vulnerable to the outliers from the irregular utterance variations and the presence of noise, which result in inaccurate speaker model. In this paper, we propose an incremental robust adaptation method to minimize the influence of outliers on Gaussian Mixture Madel based speaker model. The robust adaptation is obtained from an incremental version of M-estimation. Speaker model is initially trained from small amount of data and it is adapted recursively with the data available in each session, Experimental results from the data set gathered over seven months show that the proposed method is robust against outliers.

  • PDF

Fast Speaker Adaptation Based on Eigenspace-based MLLR Using Artificially Distorted Speech in Car Noise Environment (차량 잡음 환경에서 인위적 왜곡 음성을 이용한 Eigenspace-based MLLR에 기반한 고속 화자 적응)

  • Song, Hwa-Jeon;Jeon, Hyung-Bae;Kim, Hyung-Soon
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.119-125
    • /
    • 2009
  • This paper proposes fast speaker adaptation method using artificially distorted speech in telematics terminal under the car noise environment based on eigenspace-based maximum likelihood linear regression (ES-MLLR). The artificially distorted speech is built from adding the various car noise signals collected from a driving car to the speech signal collected from an idling car. Then, in every environment, the transformation matrix is estimated by ES-MLLR using the artificially distorted speech corresponding to the specific noise environment. In test mode, an online model is built by weighted sum of the environment transformation matrices depending on the driving condition. In 3k-word recognition task in the telematics terminal, we achieve a performance superior to ES-MLLR even using the adaptation data collected from the driving condition.

  • PDF

An improved spectrum mapping applied to speaker adaptive Kroean word recognition

  • Matsumoto, Hiroshi;Lee, Yong-Ju;Kim, Hoi-Rim;Kido, Ken'iti
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1009-1014
    • /
    • 1994
  • This paper improves the previously proposed spectral mapping method for supervised speaker adaptation in which a mapped spectrum is interpolated from speaker difference vectors at typical spectra based on a minimized distortion criterion. In estimating these difference vectors, it is important to find an appropriate number of typical points. The previous method empirically adjusts the number of typical points, while the present method optimizes the effective number by rank reduction of normal equation. This algorithm was applied to a supervised speaker adaptation for Korean word recognition using the templates form a prototype male speaker. The result showed that the rank reduction technique not only can automatically determine an optimal number of code vectors, but also slightly improves the recognition scores compared with those obtained by the previous method.

  • PDF

Speaker Tracking Using Eigendecomposition and an Index Tree of Reference Models

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.741-751
    • /
    • 2011
  • This paper focuses on online speaker tracking for telephone conversations and broadcast news. Since the online applicability imposes some limitations on the tracking strategy, such as data insufficiency, a reliable approach should be applied to compensate for this shortage. In this framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigenvoice decomposition space are proposed in this paper. We believe that the eigenvoice adaptation techniques would help to embed the speaker space in the models and hence enrich the generality of the selected speaker models. Also, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on 2002 Rich Transcription Broadcast News and Conversational Telephone Speech corpus as well as a synthetic dataset. The indexing errors of the proposed framework on telephone conversations, broadcast news, and synthetic dataset are 8.77%, 9.36%, and 12.4%, respectively. Using the index tree structure approach, the run time of the proposed framework is improved by 22%.

L1-norm Regularization for State Vector Adaptation of Subspace Gaussian Mixture Model (L1-norm regularization을 통한 SGMM의 state vector 적응)

  • Goo, Jahyun;Kim, Younggwan;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.131-138
    • /
    • 2015
  • In this paper, we propose L1-norm regularization for state vector adaptation of subspace Gaussian mixture model (SGMM). When you design a speaker adaptation system with GMM-HMM acoustic model, MAP is the most typical technique to be considered. However, in MAP adaptation procedure, large number of parameters should be updated simultaneously. We can adopt sparse adaptation such as L1-norm regularization or sparse MAP to cope with that, but the performance of sparse adaptation is not good as MAP adaptation. However, SGMM does not suffer a lot from sparse adaptation as GMM-HMM because each Gaussian mean vector in SGMM is defined as a weighted sum of basis vectors, which is much robust to the fluctuation of parameters. Since there are only a few adaptation techniques appropriate for SGMM, our proposed method could be powerful especially when the number of adaptation data is limited. Experimental results show that error reduction rate of the proposed method is better than the result of MAP adaptation of SGMM, even with small adaptation data.

Speaker Adaptation for Voice Dialing (음성 다이얼링을 위한 화자적응)

  • ;Chin-Hui Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.455-461
    • /
    • 2002
  • This paper presents a method that improves the performance of the personal voice dialling system in which speaker independent phoneme HMM's are used. Since the speaker independent phoneme HMM based voice dialing system uses only the phone transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the system which uses the speaker dependent models due to the phone recognition errors generated when the speaker independent models are used. In order to solve this problem, a new method that jointly estimates transformation vectors for the speaker adaptation and transcriptions from training utterances is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker-independent phone models. Experimental result shows that the proposed method is superior to the conventional method which used transcriptions only.