• Title/Summary/Keyword: speech task

Search Result 316, Processing Time 0.027 seconds

Dialogue Strategies to Overcome Speech Recognition Errors in Form-Filling Dialogue (양식 채우기 대화에서 음성 인식 오류의 보완을 위한 대화 전략)

  • Kang Sang-Woo;Lee Song-Wook;Seo Jung-Yun
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.2
    • /
    • pp.139-150
    • /
    • 2006
  • Speech recognition errors cause fatal results in a spoken dialogue system. When a system can not determine the speech-act of u utterance due to speech recognition errors, a dialogue system has a difficulty in continuing conversation. In this paper, we propose strategies for sub-dialogue generation by inferring the speech-act of an utterance with patterns of recognition errors on the field of form-filling dialogue. We used the proposed method on a plan-based dialogue model, corrected 27% of incomplete tasks, and acquired overall 89% of task completion rate.

  • PDF

Telephone Speech Recognition with Data-Driven Selective Temporal Filtering based on Principal Component Analysis

  • Jung Sun Gyun;Son Jong Mok;Bae Keun Sung
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.764-767
    • /
    • 2004
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis.

  • PDF

Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting (가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가)

  • Kim, Hyung-Soon;Kim, Young-Kuk;Shin, Young-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

Selective pole filtering based feature normalization for performance improvement of short utterance recognition in noisy environments (잡음 환경에서 짧은 발화 인식 성능 향상을 위한 선택적 극점 필터링 기반의 특징 정규화)

  • Choi, Bo Kyeong;Ban, Sung Min;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.103-110
    • /
    • 2017
  • The pole filtering concept has been successfully applied to cepstral feature normalization techniques for noise-robust speech recognition. In this paper, it is proposed to apply the pole filtering selectively only to the speech intervals, in order to further improve the recognition performance for short utterances in noisy environments. Experimental results on AURORA 2 task with clean-condition training show that the proposed selectively pole-filtered cepstral mean normalization (SPFCMN) and selectively pole-filtered cepstral mean and variance normalization (SPFCMVN) yield error rate reduction of 38.6% and 45.8%, respectively, compared to the baseline system.

A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition (대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계)

  • Oh, Yoo-Rhee;Yoon, Jae-Sam;kim, Hong-Kook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

On Wavelet Transform Based Feature Extraction for Speech Recognition Application

  • Kim, Jae-Gil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.2E
    • /
    • pp.31-37
    • /
    • 1998
  • This paper proposes a feature extraction method using wavelet transform for speech recognition. Speech recognition system generally carries out the recognition task based on speech features which are usually obtained via time-frequency representations such as Short-Time Fourier Transform (STFT) and Linear Predictive Coding(LPC). In some respects these methods may not be suitable for representing highly complex speech characteristics. They map the speech features with same may not frequency resolutions at all frequencies. Wavelet transform overcomes some of these limitations. Wavelet transform captures signal with fine time resolutions at high frequencies and fine frequency resolutions at low frequencies, which may present a significant advantage when analyzing highly localized speech events. Based on this motivation, this paper investigates the effectiveness of wavelet transform for feature extraction of wavelet transform for feature extraction focused on enhancing speech recognition. The proposed method is implemented using Sampled Continuous Wavelet Transform (SCWT) and its performance is tested on a speaker-independent isolated word recognizer that discerns 50 Korean words. In particular, the effect of mother wavelet employed and number of voices per octave on the performance of proposed method is investigated. Also the influence on the size of mother wavelet on the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is compared with the most prevalent conventional method, MFCC (Mel0frequency Cepstral Coefficient). The experiments show that the recognition performance of the proposed method is better than that of MFCC. But the improvement is marginal while, due to the dimensionality increase, the computational loads of proposed method is substantially greater than that of MFCC.

  • PDF

The Relationship between 3- and 5-year-old children' private speech and their mothers' scaffolding (3세와 5세 유아의 혼잣말과 어머니의 비계설정과의 관계)

  • Park, Young-Soon;Yoo, An-Jin
    • Korean Journal of Human Ecology
    • /
    • v.14 no.1
    • /
    • pp.59-68
    • /
    • 2005
  • The purposes of this study were to investigate the relationship between children's private speech during the individual session and maternal scaffolding during mother-child session. Subjects were twenty 3-year-old children and twenty 5-year-old children and their mothers recruited from day-care centers in Seoul. Mother-child interaction was videotaped for 15 minutes and maternal utterances were transcribed for analysis maternal scaffolding. Individual session of child after 3-5days was videotaped for 15 minutes and children's utterance was transcribed. Subcategories of maternal scaffolding were significantly related with children's private speech during individual session. There did appear to be an age difference in this relationship. In verbal strategy for scaffolding that 3-year-old's mother used, other-regulation and control, praise strategy was significantly related with children's private speech. In verbal strategy for scaffolding that 5-year-old's mother used, other-regulation and control, teaching strategy was significantly related with children's private speech. In maternal physical control strategy, withdrawal of mother physical control the maze task over time was significantly related with children's private speech. Withdrawal of mother physical control 5-year-old's physical performance was significantly related with children's private speech.

  • PDF

Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech (음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성)

  • Kim, Seung-Won;Zheng, Yu;Lee, Gary-Geunbae;Kim, Byeong-Chang
    • MALSORI
    • /
    • no.53
    • /
    • pp.75-92
    • /
    • 2005
  • Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

  • PDF

Effect of Percentage of Correct Consonants and Nasalance Score on the Speech Intelligibility and Acceptability in Adults with Dysarthria (마비말장애 성인의 자음정확도와 비음치가 말명료도 및 말용인도에 미치는 영향)

  • Jang, Seon Jeong;Choi, Hyun Joo
    • 재활복지
    • /
    • v.20 no.3
    • /
    • pp.67-82
    • /
    • 2016
  • The purpose of this study was to investigate relation and effect of PCC(Percentage of Correct Consonant) and nasalance score on the speech intelligibility and acceptability in adults with dysarthria by reading task of standardized passage. Ten adults with dysarthria and sixteen normal adults were participated in this study. PCC and nasalance score were measured through reading task of standardized passage. And, speech intelligibility and acceptability were examined using visual analogue criteria. The result of the study was as follows. First, the nasalance score of adults with dysarthria group is significantly higher than normal adults group in reading sample by standardized passage. Second, the PCC, speech intelligibility and acceptability shows significant correlation. However, the nasalance score doesn't show significant corelation with speech intelligibility and acceptability. These results suggest that PCC is closely related to speech intelligibility and speech acceptability, but nasalance score is not related to speech intelligibility and speech acceptability.

English /s/ and Korean sh/-/s*/ Contrast in Seoul and Busan Dialects: A Study of Category Solidity

  • Kang, Kyoung-Ho
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.3-12
    • /
    • 2012
  • The primary goal of the current study was to examine category solidity of Korean alveolar fricatives in the Busan and Seoul dialects of Korean. Considering the common belief of $/s^h/-/s^*/$ neutralization in Kyungsang speech, plain $/s^h/$ and fortis $/s^*/$ fricatives of Busan speakers were examined against the same fricatives of Seoul speakers. Perceptual distance between Korean $/s^h/$ and $/s^*/$ on the one hand and English /s/ on the other was investigated by use of across-linguistic mapping method. Two experiments of a perceptual mapping task of English /s/ to Korean $/s^h/$ and $/s^*/$ and a $/s^*/$-production task were conducted on users of the Busan and Seoul dialects of Korean. The results from the perception and production experiments suggested that at a micro-level, younger Busan speakers have less solid category stability for Korean $/s^*/$ compared with Seoul speakers, although their production of $/s^h/$ and $/s^*/$ was as highly distinctive from each other as that of Seoul speakers.