• Title/Summary/Keyword: Speaker Adaptation

Search Result 122, Processing Time 0.026 seconds

Flexible Speaker Adaptation Reflecting the Quality of Adaptation Data (Adaptation Data의 Quality를 고려한 강인한 화자 적응)

  • Pyo Hyun-A;Kim Se-Hyun;Oh Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.37-40
    • /
    • 2002
  • 최근 음성 인식 시스템의 성능 향상을 위해 화자 적응(speaker adaptation)에 대한 연구가 활발히 진행되고 있다. HMM 기반 인식 시스템의 모델 파라미터를 수정하는 화자 적응의 경우, MAP 방법과 MLLR 방법에 대한 연구가 주류를 이루고 있다. 두 방법은 adaptation data의 양에 따라서 서로 다른 성능을 보인다. 본 논문에서는 adaptation data의 quality를 정의하고, 이를 기존 두 방법의 가중치로 이용하여 화자 적응을 수행하는 방법을 제안한다. 제안한 방법을 KAIST 통신연구실에서 구축한 한국어 도시이름 500단어 인식 시스템에 적용하여 성능을 개선하였다.

  • PDF

Automatic Clustering of Speech Data Using Modified MAP Adaptation Technique (수정된 MAP 적응 기법을 이용한 음성 데이터 자동 군집화)

  • Ban, Sung Min;Kang, Byung Ok;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.77-83
    • /
    • 2014
  • This paper proposes a speaker and environment clustering method in order to overcome the degradation of the speech recognition performance caused by various noise and speaker characteristics. In this paper, instead of using the distance between Gaussian mixture model (GMM) weight vectors as in the Google's approach, the distance between the adapted mean vectors based on the modified maximum a posteriori (MAP) adaptation is used as a distance measure for vector quantization (VQ) clustering. According to our experiments on the simulation data generated by adding noise to clean speech, the proposed clustering method yields error rate reduction of 10.6% compared with baseline speaker-independent (SI) model, which is slightly better performance than the Google's approach.

The Comparison of Speaker Adaptation Methods (화자 적응 방법들의 비교)

  • 황영수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1
    • /
    • pp.61-66
    • /
    • 1999
  • In this paper, we proposed various speaker adaptation methods and studied the performance of these methods. Methods which were studied in this paper are MAPE(Maximum A Posteriori Probability Estimation), Linear Spectral Estimating, Multi-Layer Perceptron and ARTMAP. In order to evaluate the performance of these methods, we used Korean isolated digits as the experimental data, the hybrid speaker adaptation method, which unified MAPE, linear spectral estimating and output probability of SCHMM, showed the better recognition result than those which performed other methods. And the method using ARTMAP showed the similar result to above hybrid method.

  • PDF

Speech Recognition Using the Energy and VQ (에너지와 VQ를 이용한 음성 인식)

  • Hwang, Young-Soo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.6 no.3
    • /
    • pp.87-94
    • /
    • 2007
  • In this paper, the performance of the speech recognition and speaker adaptation methods are studied. The speech recognition using energy state and VQ(Vector Quantization) is suggested and the speaker adaptation methods(Maximum a posteriori probability estimation, linear specrum estimation) are considered. The experimental results show that recognition ration using energy state is 2-3 % better than that of general VQ.

  • PDF

Fast Speaker Adaptation Using Sub-Stream Based Eigenvoice (Sub-Stream 기반의 Eigenvoice를 이용한 고속 화자적응)

  • Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
    • MALSORI
    • /
    • v.55
    • /
    • pp.93-102
    • /
    • 2005
  • In this paper, sub-stream based eigenvoice method is proposed to overcome the weak points of conventional eigenvoice and dimensional eigenvoice. In the proposed method, sub-streams are automatically constructed by the statistical clustering analysis that uses the correlation information between dimensions. To obtain the reliable distance matrix from covariance matrix for dividing into optimal sub-streams, MAP adaptation technique is employed to the covariance matrix of training data and the sample covariance of adaptation data. According to our experiments, the proposed method shows $41\%$ error rate reduction when the number of adaptation data is 50.

  • PDF

Improvement of MLLR Speaker Adaptation Algorithm to Reduce Over-adaptation Using ICA and PCA (과적응 감소를 위한 주성분 분석 및 독립성분 분석을 이용한 MLLR 화자적응 알고리즘 개선)

  • 김지운;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.539-544
    • /
    • 2003
  • This paper describes how to reduce the effect of an occupation threshold by that the transform of mixture components of HMM parameters is controlled in hierarchical tree structure to prevent from over-adaptation. To reduce correlations between data elements and to remove elements with less variance, we employ PCA (Principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible, and decline the effect of over-adaptation. When we set lower occupation threshold and increase the number of transformation function, ordinary MLLR adaptation algorithm represents lower recognition rate than SI models, whereas the proposed MLLR adaptation algorithm represents the improvement of over 2% for the word recognition rate as compared to performance of SI models.

Speaker Identification Using Augmented PCA in Unknown Environments (부가 주성분분석을 이용한 미지의 환경에서의 화자식별)

  • Yu, Ha-Jin
    • MALSORI
    • /
    • no.54
    • /
    • pp.73-83
    • /
    • 2005
  • The goal of our research is to build a text-independent speaker identification system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severely degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(principal component analysis) can improve the performance in the situation. We also propose an augmented PCA process, which augments class discriminative information to the original feature vectors before PCA transformation and selects the best direction for each pair of highly confusable speakers. The proposed method reduced the relative recognition error by 21%.

  • PDF

On the Simple Speaker Verification System Using Tolerance Interval Analysis Without Background Speaker Models (Tolerance Interval Analysis를 이용한 배경화자 없는 간단한 화자인증시스템에 관한 연구)

  • Choi, Hong-Sub
    • MALSORI
    • /
    • no.56
    • /
    • pp.147-158
    • /
    • 2005
  • In this paper, we are focused to develop the simplified speaker verification algorithm without background speaker models, which will be adopted in the portable speaker verification system equipped in portable terminals such as mobile phone and PMP. According to the tolerance interval analysis, the population of someone's speaker model can be represented by a suitable number of selected independent samples of speaker model. So we can make the representative speaker model and threshold under the specified confidence level and coverage. Using proposed algorithm with the number of samples is 40, the experiments show that the false rejection rate is $3.0\%$ and the false acceptance rate $4.3\%$, worth comparing to conventional method's results, $5.4\%\;and\;5.5\%$, respectively. Next step of research will be on the suitable adaptation methods to overcome speech variation problems due to aging effect and operating environments.

  • PDF

A Study on Regression Class Generation of MLLR Adaptation Using State Level Sharing (상태레벨 공유를 이용한 MLLR 적응화의 회귀클래스 생성에 관한 연구)

  • 오세진;성우창;김광동;노덕규;송민규;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.727-739
    • /
    • 2003
  • In this paper, we propose a generation method of regression classes for adaptation in the HM-Net (Hidden Markov Network) system. The MLLR (Maximum Likelihood Linear Regression) adaptation approach is applied to the HM-Net speech recognition system for expressing the characteristics of speaker effectively and the use of HM-Net in various tasks. For the state level sharing, the context domain state splitting of PDT-SSS (Phonetic Decision Tree-based Successive State Splitting) algorithm, which has the contextual and time domain clustering, is adopted. In each state of contextual domain, the desired phoneme classes are determined by splitting the context information (classes) including target speaker's speech data. The number of adaptation parameters, such as means and variances, is autonomously controlled by contextual domain state splitting of PDT-SSS, depending on the context information and the amount of adaptation utterances from a new speaker. The experiments are performed to verify the effectiveness of the proposed method on the KLE (The center for Korean Language Engineering) 452 data and YNU (Yeungnam Dniv) 200 data. The experimental results show that the accuracies of phone, word, and sentence recognition system increased by 34∼37%, 9%, and 20%, respectively, Compared with performance according to the length of adaptation utterances, the performance are also significantly improved even in short adaptation utterances. Therefore, we can argue that the proposed regression class method is well applied to HM-Net speech recognition system employing MLLR speaker adaptation.

Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using DNN-HMM-based System (DNN-HMM 기반 시스템을 이용한 효과적인 구개인두부전증 환자 음성 인식)

  • Yoon, Ki-mu;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.1
    • /
    • pp.33-38
    • /
    • 2019
  • This paper proposes an effective recognition method of VPI patient's speech employing DNN-HMM-based speech recognition system, and evaluates the recognition performance compared to GMM-HMM-based system. The proposed method employs speaker adaptation technique to improve VPI speech recognition. This paper proposes to use simulated VPI speech for generating a prior model for speaker adaptation and selective learning of weight matrices of DNN, in order to effectively utilize the small size of VPI speech for model adaptation. We also apply Linear Input Network (LIN) based model adaptation technique for the DNN model. The proposed speaker adaptation method brings 2.35% improvement in average accuracy compared to GMM-HMM based ASR system. The experimental results demonstrate that the proposed DNN-HMM-based speech recognition system is effective for VPI speech with small-sized speech data, compared to conventional GMM-HMM system.