Search | Korea Science

Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using DNN-HMM-based System (DNN-HMM 기반 시스템을 이용한 효과적인 구개인두부전증 환자 음성 인식)

Yoon, Ki-mu;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.23 no.1
- /
- pp.33-38
- /
- 2019
This paper proposes an effective recognition method of VPI patient's speech employing DNN-HMM-based speech recognition system, and evaluates the recognition performance compared to GMM-HMM-based system. The proposed method employs speaker adaptation technique to improve VPI speech recognition. This paper proposes to use simulated VPI speech for generating a prior model for speaker adaptation and selective learning of weight matrices of DNN, in order to effectively utilize the small size of VPI speech for model adaptation. We also apply Linear Input Network (LIN) based model adaptation technique for the DNN model. The proposed speaker adaptation method brings 2.35% improvement in average accuracy compared to GMM-HMM based ASR system. The experimental results demonstrate that the proposed DNN-HMM-based speech recognition system is effective for VPI speech with small-sized speech data, compared to conventional GMM-HMM system.
https://doi.org/10.6109/jkiice.2019.23.1.33 인용 PDF KSCI HTML

Korean Word Recognition Using Vector Quantization Speaker Adaptation (벡터 양자화 화자적응기법을 사용한 한국어 단어 인식)

Choi, Kap-Seok
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.4
- /
- pp.27-37
- /
- 1991
This paper proposes the ESFVQ(energy subspace fuzzy vector quantization) that employs energy subspaces to reduce the quantizing distortion which is less than that of a fuzzy vector quatization. The ESFVQ is applied to a speaker adaptation method by which Korean words spoken by unknown speakers are recognized. By generating mapped codebooks with fuzzy histogram according to each energy subspace in the training procedure and by decoding a spoken word through the ESFVQ in the recognition proecedure, we attempt to improve the recognition rate. The performance of the ESFVQ is evaluated by measuring the quantizing distortion and the speaker adaptive recognition rate for DDD telephone area names uttered by 2 males and 1 female. The quatizing distortion of the ESFVQ is reduced by 22% than that of a vector quantization and by 5% than that of a fuzzy vector quantization, and the speaker adaptive recognition rate of the ESFVQ is increased by 26% than that without a speaker adaptation and by 11% than that of a vector quantization.
PDF

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

김성일;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.3 no.3
- /
- pp.21-26
- /
- 2002
This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.
PDF

A study on the speaker adaptation in CDHMM usling variable number of mixtures in each state (CDHMM의 상태당 가지 수를 가변시키는 화자적응에 관한 연구)

김광태;서정일;홍재근
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.3
- /
- pp.166-175
- /
- 1998
When we make a speaker adapted model using MAPE (maximum a posteriori estimation), the adapted model has one mixture in each state. This is because we cannot estimate a number of a priori distribution from a speaker-independent model in each state. If the model is represented by one mixture in each state, it is not well adadpted to specific speaker because it is difficult to represent various speech informationof the speaker with one mixture. In this paper, we suggest the method using several mixtures to well represent various speech information of the speaker in each state. But, because speaker-specific training dat is not sufficient, this method can't be used in every state. So, we make the number of mixtures in each state variable in proportion to the number of frames and to the determinant ofthe variance matrix in the state. Using the proposed method, we reduced the error rate than methods using one branch in each state.
PDF

Speaker Segmentation System Using Eigenvoice-based Speaker Weight Distance Method (Eigenvoice 기반 화자가중치 거리측정 방식을 이용한 화자 분할 시스템)

Choi, Mu-Yeol;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.4
- /
- pp.266-272
- /
- 2012
Speaker segmentation is a process of automatically detecting the speaker boundary points in the audio data. Speaker segmentation methods are divided into two categories depending on whether they use a prior knowledge or not: One is the model-based segmentation and the other is the metric-based segmentation. In this paper, we introduce the eigenvoice-based speaker weight distance method and compare it with the representative metric-based methods. Also, we employ and compare the Euclidean and cosine similarity functions to calculate the distance between speaker weight vectors. And we verify that the speaker weight distance method is computationally very efficient compared with the method directly using the distance between the speaker adapted models constructed by the eigenvoice technique.
https://doi.org/10.7776/ASK.2012.31.4.266 인용 PDF KSCI

Improvements in Speaker Adaptation Using Weighted Training (가중 훈련을 이용한 화자 적응 시스템의 향상)

장규철;우수영;진민호;박용규;유창동
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.3
- /
- pp.188-193
- /
- 2003
Regardless of the distribution of the adaptation data in the testing environment, model-based adaptation methods that have so far been reported in various literature incorporates the adaptation data undiscriminatingly in reducing the mismatch between the training and testing environments. When the amount of data is small and the parameter tying is extensive, adaptation based on outlier data can be detrimental to the performance of the recognizer. The distribution of the adaptation data plays a critical role on the adaptation performance. In order to maximally improve the recognition rate in the testing environment using only a small number of adaptation data, supervised weighted training is applied to the structural maximum a posterior (SMAP) algorithm. We evaluate the performance of the proposed weighted SMAP (WSMAP) and SMAP on TIDIGITS corpus. The proposed WSMAP has been found to perform better for a small amount of data. The general idea of incorporating the distribution of the adaptation data is applicable to other adaptation algorithms.
PDF KSCI

Speaker Recognition using PCA in Driving Car Environments (PCA를 이용한 자동차 주행 환경에서의 화자인식)

Yu, Ha-Jin
- Proceedings of the KSPS conference
- /
- 2005.04a
- /
- pp.103-106
- /
- 2005
The goal of our research is to build a text independent speaker recognition system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severally degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(Principal component analysis) without dimension reduction can greatly increase the performance to a level close to matched condition. The error rate is reduced more by the proposed augmented PCA, which augment an axis to the feature vectors of the most confusable pairs of speakers before PCA
PDF

Speaker Adaptation Using Linear Transformation Network in Speech Recognition (선형 변환망을 이용한 화자적응 음성인식)

이기희
- Journal of the Korea Society of Computer and Information
- /
- v.5 no.2
- /
- pp.90-97
- /
- 2000
This paper describes an speaker-adaptive speech recognition system which make a reliable recognition of speech signal for new speakers. In the Proposed method, an speech spectrum of new speaker is adapted to the reference speech spectrum by using Parameters of a 1st linear transformation network at the front of phoneme classification neural network. And the recognition system is based on semicontinuous HMM(hidden markov model) which use the multilayer perceptron as a fuzzy vector quantizer. The experiments on the isolated word recognition are performed to show the recognition rate of the recognition system. In the case of speaker adaptation recognition, the recognition rate show significant improvement for the unadapted recognition system.
PDF

Performance Improvement of a Text-Independent Speaker Identification System Using MCE Training (MCE 학습 알고리즘을 이용한 문장독립형 화자식별의 성능 개선)

Kim Tae-Jin;Choi Jae-Gil;Kwon Chul-Hong
- MALSORI
- /
- no.57
- /
- pp.165-174
- /
- 2006
In this paper we use a training algorithm, MCE (Minimum Classification Error), to improve the performance of a text-independent speaker identification system. The MCE training scheme takes account of possible competing speaker hypotheses and tries to reduce the probability of incorrect hypotheses. Experiments performed on a small set speaker identification task show that the discriminant training method using MCE can reduce identification errors by up to 54% over a baseline system trained using Bayesian adaptation to derive GMM (Gaussian Mixture Models) speaker models from a UBM (Universal Background Model).
PDF

Voice Dialing system using Stochastic Matching (확률적 매칭을 사용한 음성 다이얼링 시스템)

김원구
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2004.04a
- /
- pp.515-518
- /
- 2004
This paper presents a method that improves the performance of the personal voice dialling system in which speaker Independent phoneme HMM's are used. Since the speaker independent phoneme HMM based voice dialing system uses only the phone transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the system which uses the speaker dependent models due to the phone recognition errors generated when the speaker Independent models are used. In order to solve this problem, a new method that jointly estimates transformation vectors for the speaker adaptation and transcriptions from training utterances is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker-independent phone models. Experimental result shows that the proposed method is superior to the conventional method which used transcriptions only.
PDF

Search Result 122, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)