• Title/Summary/Keyword: Post-processing for Speech Recognition

Search Result 36, Processing Time 0.025 seconds

Performance Improvement of Voice Dialing System using Post-Processing (후처리를 이용한 음성 다이얼링 시스템의 성능향상)

  • 김원구
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.9-12
    • /
    • 2000
  • Voice dialing system can recognize the speaker's command and dial the destinate phone number automatically. Such a system is useful for wireless handsets and portable communication devices. As a personal voice dialing system, all the commands are used to train the HMM for speech recognition based on owner-selected phrases. Its implementation requires much less memory space and computation resource compared to a speaker-independent system. Since only two or three training utterances per command are used in this system, it is difficult to estimate exact state duration distribution to improve the recognition performance. Therefore a post-processor is presented to improve the performance. Experiments which use the database collected through the telephone line showed that the proposed post-processor improves the recognition system performance.

  • PDF

A Study on Utterance Verification Using Accumulation of Negative Log-likelihood Ratio (음의 유사도 비율 누적 방법을 이용한 발화검증 연구)

  • 한명희;이호준;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.194-201
    • /
    • 2003
  • In speech recognition, confidence measuring is to decide whether it can be accepted as the recognized results or not. The confidence is measured by integrating frames into phone and word level. In case of word recognition, the confidence measuring verifies the results of recognition and Out-Of-Vocabulary (OOV). Therefore, the post-processing could improve the performance of recognizer without accepting it as a recognition error. In this paper, we measure the confidence modifying log likelihood ratio (LLR) which was the previous confidence measuring. It accumulates only those which the log likelihood ratio is negative when integrating the confidence to phone level from frame level. When comparing the verification performance for the results of word recognizer with the previous method, the FAR (False Acceptance Ratio) is decreased about 3.49% for the OOV and 15.25% for the recognition error when CAR (Correct Acceptance Ratio) is about 90%.

Plosive consonants recognition using acoustic properties with the frames representing each phoneme (조음 특성과 음소 대표 구간을 이용한 우리말 파열음의 인식)

  • 박찬응;이쾌희
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.4
    • /
    • pp.33-41
    • /
    • 1997
  • Korean unvoiced phonemes consist of nonstationary parts comparing that the vowels and nasal consonants consist of quasi-stationary part. And some phonemes, which have smae point of articulation but differnt manner of articulation, has similar characteristics, so it makes to be hard to distinguish each other. A new method usin gchanges and characteristics of acoustic properties of these phonemes to improve recognition rate are proposed. And because these changes and cahracteristics evidently occur in continuous speech except some unvoiced consonants are articulated as voiced phoneme in case to be used as an midial between voiced phonemes, this method can be applied easily. The features of the frames extracted to represent each phonemes are used asinputs to the hierarchical neural network. And with these results final decision for phoneme recognition is made thorugh post processing which the new method is applied to. Through the experimental recognition results for 9 unvoiced consonants which belong to bilabial, alveolar, and velar phoneme series, 89.4% recognition rate to distinguish in same phoneme series is obtained, and 85.6% recognition rate is obtained in case of including cistinguishing phoneme series.

  • PDF

Large Vocabulary Continuous Speech Recognition Based on Language Model Network (언어 모델 네트워크에 기반한 대어휘 연속 음성 인식)

  • 안동훈;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.543-551
    • /
    • 2002
  • In this paper, we present an efficient decoding method that performs in real time for 20k word continuous speech recognition task. Basic search method is a one-pass Viterbi decoder on the search space constructed from the novel language model network. With the consistent search space representation derived from various language models by the LM network, we incorporate basic pruning strategies, from which tokens alive constitute a dynamic search space. To facilitate post-processing, it produces a word graph and a N-best list subsequently. The decoder is tested on the database of 20k words and evaluated with respect to accuracy and RTF.

Keyword Spotting Algorithm within a Continuous Syllable Sentence for the Post-Processing of Speech Recognition (음성 인식 후처리를 위한 연속 음절 문장의 키워드 추출 알고리즘)

  • Cho, Shi-Won;Lee, Dong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2008.04a
    • /
    • pp.170-171
    • /
    • 2008
  • 연속적인 음성 인식 결과는 띄어쓰기를 하지 않은 연속 음절 문장들로 이루어져 있다. 본 논문은 음성 인식 후처리 단계에서 연속 음절 문장을 조사/어미 사전을 이용한 어절 생성 과정과 형태소 분석기를 이용하여 어절을 생성한 후 키워드를 추출한다. 실험 결과, 어절 생성기만 적용한 방식보다 제안된 알고리즘의 인식률이 향상되는 것을 확인하였다.

  • PDF

A Log-Energy Feature Normalization Method Using ARMA Filter (ARMA 필터를 이용한 로그 에너지 특징의 정규화 방법)

  • Shen, Guang-Hu;Jung, Ho-Youl;Chung, Hyun-Yeol
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.10
    • /
    • pp.1325-1337
    • /
    • 2008
  • The difference of environments between training and recognition is the major reason of degradation of speech recognition. To solve this mismatch of environments, various noise processing methods have been studied. Among them, ERN(log-Energy dynamic Range Normalization) and SEN(Silence Energy Normalization) for normalization of log energy features show better performance than others. However, these methods have a problem that they can hardly achieve normalization for the relatively higher values of log energy features and the environmental mismatch caused by this problem becomes bigger especially in low SNR environments. To solve these problems, we propose applying ARMA filter as post-processing for smoothing log energy features by calculating the moving average in auto-regression scheme. From the recognition results conducted on Aurora 2.0 DB, the proposed method shows improved recognition results comparing with conventional methods.

  • PDF

A Study on a Generation of a Syllable Restoration Candidate Set and a Candidate Decrease (음절 복원 후보 집합의 생성과 후보 감소에 관한 연구)

  • 김규식;김경징;이상범
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.12
    • /
    • pp.1679-1690
    • /
    • 2002
  • This paper, describe about a generation of a syllable restoration regulation for a post processing of a speech recognition and a decrease of a restoration candidate. It created a syllable restoration regulation to create a restoration candidate pronounced with phonetic value recognized through a post processing of the formula system that was a tone to recognize syllable unit phonetic value for a performance enhancement of a dialogue serial speech recognition. Also, I presented a plan to remove a regulation to create unused notation from a real life in a restoration regulation with a plan to reduce number candidate of a restoration meeting. A design implemented a restoration candidate set generator in order a syllable restoration regulation display that it created a proper restoration candidate set. The proper notation meeting that as a result of having proved about a standard pronunciation example and a word extracted from a pronunciation dictionary at random, the notation that an utterance was former was included in proved with what a generation became.

  • PDF

A study on performance improvement of neural network using output probability of HMM (HMM의 출력확률을 이용한 신경회로망의 성능향상에 관한 연구)

  • Pyo Chang Soo;Kim Chang Keun;Hur Kang In
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.1 no.1
    • /
    • pp.1-6
    • /
    • 2000
  • In this paper, the hybrid system of HMM and neural network is proposed and show better recognition rate of the post-process procedure which minimizes the process error of recognition than that of HMM(Hidden Markov Model) only used. After the HMM training by training data, testing data that are not taken part in the training are sent to HMM. The output probability from HMM output by testing data is used for the training data of the neural network, post processor. After neural network training, the hybrid system is completed. This hybrid system makes the recognition rate improvement of about $4.5\%$ in MLP and about $2\%$ in RBFN and gives the solution to training time of conventional hybrid system and to decrease of the recognition rate due to the lack of training data in real-time speech recognition system.

  • PDF

The syllable recovrey rule-based system and the application of a morphological analysis method for the post-processing of a continuous speech recognition (연속음성인식 후처리를 위한 음절 복원 rule-based 시스템과 형태소분석기법의 적용)

  • 박미성;김미진;김계성;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.47-56
    • /
    • 1999
  • Various phonological alteration occurs when we pronounce continuously in korean. This phonological alteration is one of the major reasons which make the speech recognition of korean difficult. This paper presents a rule-based system which converts a speech recognition character string to a text-based character string. The recovery results are morphologically analyzed and only a correct text string is generated. Recovery is executed according to four kinds of rules, i.e., a syllable boundary final-consonant initial-consonant recovery rule, a vowel-process recovery rule, a last syllable final-consonant recovery rule and a monosyllable process rule. We use a x-clustering information for an efficient recovery and use a postfix-syllable frequency information for restricting recovery candidates to enter morphological analyzer. Because this system is a rule-based system, it doesn't necessitate a large pronouncing dictionary or a phoneme dictionary and the advantage of this system is that we can use the being text based morphological analyzer.

  • PDF