• Title, Summary, Keyword: Continuous Speech

Search Result 294, Processing Time 0.041 seconds

Automatic Detection of Intonational and Accentual Phrases in Korean Standard Continuous Speech (한국 표준어 연속음성에서의 억양구와 강세구 자동 검출)

  • Lee, Ki-Young;Song, Min-Suck
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.209-224
    • /
    • 2000
  • This paper proposes an automatic detection method of intonational and accentual phrases in Korean standard continuous speech. We use the pause over 150 msec for detecting intonational phrases, and extract accentual phrases from the intonational phrases by analyzing syllables and pitch contours. The speech data for the experiment are composed of seven male voices and two female voices which read the texts of the fable 'the ant and the grasshopper' and a newspaper article 'manmulsang' in normal speed and in Korean standard variation. The results of the experiment shows that the detection rate of intonational phrases is 95% on the average and that of accentual phrases is 73%. This detection rate implies that we can segment the continuous speech into smaller units(i.e. prosodic phrases) by using the prosodic information and so the objects of speech recognition can narrow down to words or phrases in continuous speech.

  • PDF

Adaptive Korean Continuous Speech Recognizer to Speech Rate (발화속도 적응적인 한국어 연속음 인식기)

  • Kim, Jae-Beom;Park, Chan-Kyu;Han, Mi-Sung;Lee, Jung-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1531-1540
    • /
    • 1997
  • In this paper, we presents automatic Korean continuous speech recognizer which is improved by the speech rate estimation and the compensation methods. Automatic continuous speech recognition is significantly more difficult than isolated word recognition because of coarticulatory effects and variations in speech rate. In order to recognize continuous speech, modeling methods of coarticulatory effects and variations in speech rate are needed. In this paper, the speech rate is measured by change of format, and the compensation is peformed by extracting relatively many feature vectors in fast speech. Coarticulatory effects are modeled by defining 514 Korean diphone set, and ETRI's 445 word DB is used for training speech material. With combining above methods, we implement automatic Korean continuous speech recognizer, which shows improved recognition rate, based on DHMM(Discrete Hidden Markov Model).

  • PDF

Robust Speech Detection Based on Useful Bands for Continuous Digit Speech over Telephone Networks

  • Ji, Mi-Kyongi;Suh, Young-Joo;Kim, Hoi-Rin;Kim, Sang-Hun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3E
    • /
    • pp.113-123
    • /
    • 2003
  • One of the most important problems in speech recognition is to detect the presence of speech in adverse environments. In other words, the accurate detection of speech boundary is critical to the performance of speech recognition. Furthermore the speech detection problem becomes severer when recognition systems are used over the telephone network, especially wireless network and noisy environment. Therefore this paper describes various speech detection algorithms for continuous digit recognition system used over wire/wireless telephone networks and we propose a algorithm in order to improve the robustness of speech detection using useful band selection under noisy telephone networks. In this paper, we compare some speech detection algorithms with the proposed one, and present experimental results done with various SNRs. The results show that the new algorithm outperforms the other speech detection methods.

Continuous Digit Recognition Using the Weight Initialization and LR Parser

  • Choi, Ki-Hoon;Lee, Seong-Kwon;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2E
    • /
    • pp.14-23
    • /
    • 1996
  • This paper is a on the neural network to recognize the phonemes, the weight initialization to reduce learning speed, and LR parser for continuous speech recognition. The neural network spots the phonemes in continuous speech and LR parser parses the output of neural network. The whole phonemes recognized in neural network are divided into several groups which are grouped by the similarity of phonemes, and then each group consists of neural network. Each group of neural network to recognize the phonemes consisits of that recognize the phonemes of their own group and VGNN(Verify Group Neural Network) which judges whether the inputs are their own group or not. The weights of neural network are not initialized with random values but initialized from learning data to reduce learning speed. The LR parsing method applied to this paper is not a method which traces a unique path, but one which traces several possible paths because the output of neural network is not accurate. The parser processes the continuous speech frame by frame as accumulating the output of neural network through several possible paths. If this accumulated path-value drops below the threshold value, this path is deleted in possible parsing paths. This paper applies the continuous speech recognition system to the threshold value, this path is deleted in possible parsing paths. This paper applies the continuous speech recognition system to the continuous Korea digits recognition. The recognition rate of isolated digits is 97% in speaker dependent, and 75% in speaker dependent. The recognition rate of continuous digits is 74% in spaker dependent.

  • PDF

A Study on the Korean Continuous Speech Recognition using Adaptive Pruning Algorithm and PDT-SSS Algorithm (적응 프루닝 알고리즘과 PDT-SSS 알고리즘을 이용한 한국어 연속음성인식에 관한 연구)

  • 황철준;오세진;김범국;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.6
    • /
    • pp.524-533
    • /
    • 2001
  • Efficient continuous speech recognition system for practical applications requires that the processing be carried out in real time and high recognition accuracy. In this paper, we study the acoustic models by adopting the PDT-SSS algorithm and the language models by iterative learning so as to improve the speech recognition accuracy. And the adaptive pruning algorithm is applied to the continuous speech. To verify the effectiveness of proposed method, we carried out the continuous speech recognition for the Korean air flight reservation task. Experimental results show that the adopted algorithm has the average 90.9% for continuous speech recognition and the average 90.7% for word recognition accuracy including continuous speech. And in case of adopting the adaptive pruning algorithm to continuous speech, it reduces the recognition time of about 1.2 seconds(15%) without any loss of accuracy. From the result, we proved the effectiveness of the PDT-SSS algorithm and the adaptive pruning algorithm.

  • PDF

A Study on Speech Period and Pitch Detection for Continuous Speech Recognition (연속음성인식을 위한 음성구간과 피치검출에 관한 연구)

  • Kim Tai Suk;Chang jong chil
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.1
    • /
    • pp.56-61
    • /
    • 2005
  • In this thesis, propose speech period and pitch detection for continuous speech recognition. This mathod is distinguishes between vowel and consonant to frame unit in continuous speech, for distinguishable voice. Powerful extraction of speech period could threshold energy make use of input signal to real noise environment. Also algorithm of this method distinguish between vowel and consonant at the same time in voice make use of zero crossing rate and short time energy to extractible speech period.

  • PDF

A Study on Vocabulary-Independent Continuous Speech Recognition System for Intelligent Home Network System (지능형 홈네트워크 시스템을 위한 가변어휘 연속음성인식시스템에 관한 연구)

  • Lee, Ho-Woong;Jeong, Hee-Suk
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.7 no.2
    • /
    • pp.37-42
    • /
    • 2008
  • In this paper, the vocabulary-independent continuous speech recognition system for speech control of intelligent home-network is presented. This study suggests a conversational scenario of continuous natural vocabulary based upon keywords for recognition on natural speech command, and a way of optimizing the recognition system by constructing a recognition system and database based upon keywords.

  • PDF

Phonological Process and Word Recognition in Continuous Speech: Evidence from Coda-neutralization (음운 현상과 연속 발화에서의 단어 인지 - 종성중화 작용을 중심으로)

  • Kim, Sun-Mi;Nam, Ki-Chun
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.17-25
    • /
    • 2010
  • This study explores whether Koreans exploit their native coda-neutralization process when recognizing words in Korean continuous speech. According to the phonological rules in Korean, coda-neutralization process must come before the liaison process, as long as the latter(i.e. liaison process) occurs between 'words', which results in liaison-consonants being coda-neutralized ones such as /b/, /d/, or /g/, rather than non-neutralized ones like /p/, /t/, /k/, /ʧ/, /ʤ/, or /s/. Consequently, if Korean listeners use their native coda-neutralization rules when processing speech input, word recognition will be hampered when non-neutralized consonants precede vowel-initial targets. Word-spotting and word-monitoring tasks were conducted in Experiment 1 and 2, respectively. In both experiments, listeners recognized words faster and more accurately when vowel-initial target words were preceded by coda-neutralized consonants than when preceded by coda non-neutralized ones. The results show that Korean listeners exploit the coda-neutralization process when processing their native spoken language.

  • PDF

Speech Synthesis Based on CVC Speech Segments Extracted from Continuous Speech (연속 음성으로부터 추출한 CVC 음성세그먼트 기반의 음성합성)

  • 김재홍;조관선;이철희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.10-16
    • /
    • 1999
  • In this paper, we propose a concatenation-based speech synthesizer using CVC(consonant-vowel-consonant) speech segments extracted from an undesigned continuous speech corpus. Natural synthetic speech can be generated by a proper modelling of coarticulation effects between phonemes and the use of natural prosodic variations. In general, CVC synthesis unit shows smaller acoustic degradation of speech quality since concatenation points are located in the consonant region and it can properly model the coarticulation of vowels that are effected by surrounding consonants. In this paper, we analyze the characteristics and the number of required synthesis units of 4 types of speech synthesis methods that use CVC synthesis units. Furthermore, we compare the speech quality of the 4 types and propose a new synthesis method based on the most promising type in terms of speech quality and implementability. Then we implement the method using the speech corpus and synthesize various examples. The CVC speech segments that are not in the speech corpus are substituted by demonstrate speech segments. Experiments demonstrate that CVC speech segments extracted from about 100 Mbytes continuous speech corpus can produce high quality synthetic speech.

  • PDF

A study on extraction of the frames representing each phoneme in continuous speech (연속음에서의 각 음소의 대표구간 추출에 관한 연구)

  • 박찬응;이쾌희
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.4
    • /
    • pp.174-182
    • /
    • 1996
  • In continuous speech recognition system, it is possible to implement the system which can handle unlimited number of words by using limited number of phonetic units such as phonemes. Dividing continuous speech into the string of tems of phonemes prior to recognition process can lower the complexity of the system. But because of the coarticulations between neiboring phonemes, it is very difficult ot extract exactly their boundaries. In this paper, we propose the algorithm ot extract short terms which can represent each phonemes instead of extracting their boundaries. The short terms of lower spectral change and higher spectral chang eare detcted. Then phoneme changes are detected using distance measure with this lower spectral change terms, and hgher spectral change terms are regarded as transition terms or short phoneme terms. Finally lower spectral change terms and the mid-term of higher spectral change terms are regarded s the represent each phonemes. The cepstral coefficients and weighted cepstral distance are used for speech feature and measuring the distance because of less computational complexity, and the speech data used in this experimetn was recoreded at silent and ordinary in-dorr environment. Through the experimental results, the proposed algorithm showed higher performance with less computational complexity comparing with the conventional segmetnation algorithms and it can be applied usefully in phoneme-based continuous speech recognition.

  • PDF