• Title/Summary/Keyword: phonetic HMM

Search Result 50, Processing Time 0.02 seconds

Speaker Identification using Phonetic GMM (음소별 GMM을 이용한 화자식별)

  • Kwon Sukbong;Kim Hoi-Rin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.185-188
    • /
    • 2003
  • In this paper, we construct phonetic GMM for text-independent speaker identification system. The basic idea is to combine of the advantages of baseline GMM and HMM. GMM is more proper for text-independent speaker identification system. In text-dependent system, HMM do work better. Phonetic GMM represents more sophistgate text-dependent speaker model based on text-independent speaker model. In speaker identification system, phonetic GMM using HMM-based speaker-independent phoneme recognition results in better performance than baseline GMM. In addition to the method, N-best recognition algorithm used to decrease the computation complexity and to be applicable to new speakers.

  • PDF

Phonetic Transcription based Speech Recognition using Stochastic Matching Method (확률적 매칭 방법을 사용한 음소열 기반 음성 인식)

  • Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.5
    • /
    • pp.696-700
    • /
    • 2007
  • A new method that improves the performance of the phonetic transcription based speech recognition system is presented with the speaker-independent phonetic recognizer. Since SI phoneme HMM based speech recognition system uses only the phoneme transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the phoneme recognition errors generated from using SI models. A new training method that iteratively estimates the phonetic transcription and transformation vectors is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the transformation vectors. The experiments performed over actual telephone line shows that a reduction of about 45% in the error rates could be achieved as compared to the conventional method.

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Bae Keunsung
    • MALSORI
    • /
    • no.52
    • /
    • pp.111-120
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5 kbytes for program code. Maximum required time of 29.2 ms for processing a frame of 32 ms of speech validates real-time operation of the implemented system.

  • PDF

An Efficient Model Parameter Compensation Method foe Robust Speech Recognition

  • Chung Yong-Joo
    • MALSORI
    • /
    • no.45
    • /
    • pp.107-115
    • /
    • 2003
  • An efficient method that compensates the HMM parameters for the noisy speech recognition is proposed. Instead of assuming some analytical approximations as in the PMC, the proposed method directly re-estimates the HMM parameters by the segmental k-means algorithm. The proposed method has shown improved results compared with the conventional PMC method at reduced computational cost.

  • PDF

Improvement of an Automatic Segmentation for TTS Using Voiced/Unvoiced/Silence Information (유/무성/묵음 정보를 이용한 TTS용 자동음소분할기 성능향상)

  • Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
    • MALSORI
    • /
    • no.58
    • /
    • pp.67-81
    • /
    • 2006
  • For a large corpus of time-aligned data, HMM based approaches are most widely used for automatic segmentation, providing a consistent and accurate phone labeling scheme. There are two methods for training in HMM. Flat starting method has a property that human interference is minimized but it has low accuracy. Bootstrap method has a high accuracy, but it has a defect that manual segmentation is required In this paper, a new algorithm is proposed to minimize manual work and to improve the performance of automatic segmentation. At first phase, voiced, unvoiced and silence classification is performed for each speech data frame. At second phase, the phoneme sequence is aligned dynamically to the voiced/unvoiced/silence sequence according to the acoustic phonetic rules. Finally, using these segmented speech data as a bootstrap, phoneme model parameters based on HMM are trained. For the performance test, hand labeled ETRI speech DB was used. The experiment results showed that our algorithm achieved 10% improvement of segmentation accuracy within 20 ms tolerable error range. Especially for the unvoiced consonants, it showed 30% improvement.

  • PDF

A Study on the Characteristics of Segmental-Feature HMM (분절특징 HMM의 특성에 관한 연구)

  • Yun Young-Sun;Jung Ho-Young
    • MALSORI
    • /
    • no.43
    • /
    • pp.163-178
    • /
    • 2002
  • In this paper, we discuss the characteristics of Segmental-Feature HMM and summarize previous studies of SFHMM. There are several approaches to reduce the number of parameters in the previous studies. However, if the number of parameters decreased, the performance of systems also fell. Therefore, we consider the fast computation approach with preserving the same number of parameters. In this paper, we present the new segment comparison method to speed up the computation of SFHMM without loss of performance. The proposed method uses the three-frame calculation rather than the full(five) frames in the given segment. The experimental results show that the performance of the proposed system is better than that of the previous studies.

  • PDF

A Robust Speaker Identification Using Optimized Confidence and Modified HMM Decoder (최적화된 관측 신뢰도와 변형된 HMM 디코더를 이용한 잡음에 강인한 화자식별 시스템)

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-Yu
    • MALSORI
    • /
    • no.64
    • /
    • pp.121-135
    • /
    • 2007
  • Speech signal is distorted by channel characteristics or additive noise and then the performances of speaker or speech recognition are severely degraded. To cope with the noise problem, we propose a modified HMM decoder algorithm using SNR-based observation confidence, which was successfully applied for GMM in speaker identification task. The modification is done by weighting observation probabilities with reliability values obtained from SNR. Also, we apply PSO (particle swarm optimization) method to the confidence function for maximizing the speaker identification performance. To evaluate our proposed method, we used the ETRI database for speaker recognition. The experimental results showed that the performance was definitely enhanced with the modified HMM decoder algorithm.

  • PDF

A Study on the Voice Conversion with HMM-based Korean Speech Synthesis (HMM 기반의 한국어 음성합성에서 음색변환에 관한 연구)

  • Kim, Il-Hwan;Bae, Keun-Sung
    • MALSORI
    • /
    • v.68
    • /
    • pp.65-74
    • /
    • 2008
  • A statistical parametric speech synthesis system based on the hidden Markov models (HMMs) has grown in popularity over the last few years, because it needs less memory and low computation complexity and is suitable for the embedded system in comparison with a corpus-based unit concatenation text-to-speech (TTS) system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately. In this paper, we present experimental results of voice characteristics conversion using the HMM-based Korean speech synthesis system. The results have shown that conversion of voice characteristics could be achieved using a few sentences uttered by a target speaker. Synthetic speech generated from adapted models with only ten sentences was very close to that from the speaker dependent models trained using 646 sentences.

  • PDF

Large Scale Voice Dialling using Speaker Adaptation (화자 적응을 이용한 대용량 음성 다이얼링)

  • Kim, Weon-Goo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.4
    • /
    • pp.335-338
    • /
    • 2010
  • A new method that improves the performance of large scale voice dialling system is presented using speaker adaptation. Since SI (Speaker Independent) based speech recognition system with phoneme HMM uses only the phoneme string of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the mismatch between the input utterance and the SI models. A new method that estimates the phonetic string and adaptation vectors iteratively is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the adaptation vectors. The experiments performed over actual telephone line shows that proposed method shows better performance as compared to the conventional method. with the SI phonetic recognizer.

Phonetic Acoustic Knowledge and Divide And Conquer Based Segmentation Algorithm (음성학적 지식과 DAC 기반 분할 알고리즘)

  • Koo, Chan-Mo;Wang, Gi-Nam
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.215-222
    • /
    • 2002
  • This paper presents a reliable fully automatic labeling system which fits well with languages having well-developed syllables such as in Korean. The ASL System utilize DAC (Divide and Conquer), a control mechanism, based segmentation algorithm to use phonetic and acoustic information with greater efficiency. The segmentation algorithm is to devide speech signals into speechlets which is localized speech signal pieces and to segment each speechlet for speech boundaries. While HMM method has uniform and definite efficiencies, the suggested method gives framework to steadily develope and improve specified acoustic knowledges as a component. Without using statistical method such as HMM, this new method use only phonetic-acoustic information. Therefore, this method has high speed performance, is consistent extending the specific acoustic knowledge component, and can be applied in efficient way. we show experiment result to verify suggested method at the end.