• Title/Summary/Keyword: Speech Data

Search Result 1,384, Processing Time 0.032 seconds

Use of Acoustic Analysis for Indivisualised Therapeutic Planning and Assessment of Treatment Effect in the Dysarthric Children (조음장애 환아에서 개별화된 치료계획 수립과 효과 판정을 위한 음향음성학적 분석방법의 활용)

  • Kim, Yun-Hee;Yu, Hee;Shin, Seung-Hun;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.19-35
    • /
    • 2000
  • Speech evaluation and treatment planning for the patients with articulation disorders have traditionally been based on perceptual judgement by speech pathologists. Recently, various computerized speech analysis systems have been developed and commonly used in clinical settings to obtain the objective and quantitative data and specific treatment strategies. 10 dysarthric children (6 neurogenic and 4 functional dysarthria) participated in this experiment. Speech evaluation of dysarthria was performed in two ways; first, the acoustic analysis by Visi-Pitch and a Computerized Speech Lab and second, the perceptual scoring of phonetic errors rates in 100 word test. The results of the initial evaluation served as primary guidlines for the indivisualized treatment planning of each patient's speech problems. After mean treatment period of 5 months, the follow-up data of both dysarthric groups showed increased maximum phonation time, increased alternative motion rate and decreased occurrence of articulatory deviation. The changes of acoustic data and therapeutic effects were more prominent in children with dysarthria due to neurologic causes than with functional dysarthria. Three cases including their pre- and post treatment data were illustrated in detail.

  • PDF

A VQ Codebook Design Based on Phonetic Distribution for Distributed Speech Recognition (분산 음성인식 시스템의 성능향상을 위한 음소 빈도 비율에 기반한 VQ 코드북 설계)

  • Oh Yoo-Rhee;Yoon Jae-Sam;Lee Gil-Ho;Kim Hong-Kook;Ryu Chang-Sun;Koo Myoung-Wa
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.37-40
    • /
    • 2006
  • In this paper, we propose a VQ codebook design of speech recognition feature parameters in order to improve the performance of a distributed speech recognition system. For the context-dependent HMMs, a VQ codebook should be correlated with phonetic distributions in the training data for HMMs. Thus, we focus on a selection method of training data based on phonetic distribution instead of using all the training data for an efficient VQ codebook design. From the speech recognition experiments using the Aurora 4 database, the distributed speech recognition system employing a VQ codebook designed by the proposed method reduced the word error rate (WER) by 10% when compared with that using a VQ codebook trained with the whole training data.

  • PDF

A Basic Study on the Development of a Grading Scale of Discourse Competence in Korean Speaking Assessment -Focusing on the Scale of 'REFUSAL' Task (한국어 말하기 평가에서 '담화 능력' 등급 기술을 위한 기초 연구 -'부탁'에 대한 '거절하기' 과제를 중심으로-)

  • Lee, Haeyong;Lee, Hyang
    • Journal of Korean language education
    • /
    • v.29 no.3
    • /
    • pp.255-292
    • /
    • 2018
  • Most grading scales of Korean language proficiency tests are based on existing grading scales that are not empirically verified. The purpose of this study is to develop an empirically verified scale descriptor. The 'Performance data-driven approach' that is suggested by Fulcher (1987) was used to develop the detailed description of characteristics for each level of performance. This study is focused on the functional phase of speech samples analysis (coding data) to create explanatory categories of discourse skills into which individual observations of speech phenomena can be scored. The speech samples that were collected through this study demonstrated stages of speech that can be a foundation of a grading scale. The data used in the study was collected from 23 native speakers of Korean. Speech samples were recorded from simulated speaking tests using the 'REFUSAL' task, and transcribed for analysis. The transcript was analyzed using discourse analysis. The result showed that the 'REFUSAL' task needs to go through four functional phases in actual communication. Furthermore, this study found specific and detailed explanatory categories of discourse competence based on the actual native speaker's speech data. Such findings are expected to contribute to the development of more valid and reliable speaking assessment.

Performance of Vocabulary-Independent Speech Recognizers with Speaker Adaptation

  • Kwon, Oh Wook;Un, Chong Kwan;Kim, Hoi Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1E
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we investigated performance of a vocabulary-independent speech recognizer with speaker adaptation. The vocabulary-independent speech recognizer does not require task-oriented speech databases to estimate HMM parameters, but adapts the parameters recursively by using input speech and recognition results. The recognizer has the advantage that it relieves efforts to record the speech databases and can be easily adapted to a new task and a new speaker with different recognition vocabulary without losing recognition accuracies. Experimental results showed that the vocabulary-independent speech recognizer with supervised offline speaker adaptation reduced 40% of recognition errors when 80 words from the same vocabulary as test data were used as adaptation data. The recognizer with unsupervised online speaker adaptation reduced abut 43% of recognition errors. This performance is comparable to that of a speaker-independent speech recognizer trained by a task-oriented speech database.

  • PDF

The Validation of Speech Recognition Performance according to Microphones (마이크로폰의 종류에 따른 음성인식성능의 검토)

  • Kim Yoen-Whon;Lee Kwang-Hyun;Jung Young-Jo;Kim Bong-Wan;Lee Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.183-186
    • /
    • 2003
  • Speech recognition performance depends on various factors. One of the factors is the characteristic of a microphone which is used when speech data is collected. Thus, in the present experiment speech databases for tests are created through varying types of microphones. Then, acoustic models are built based on these databases, and each of the acoustic models is assessed by the data to determine recognition performance depending on various microphones.

  • PDF

Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data (음향 데이터로부터 얻은 확장된 음소 단위를 이용한 한국어 자유발화 음성인식기의 성능)

  • Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.39-47
    • /
    • 2019
  • We propose a method to improve the performance of spontaneous speech recognizers by extending their phone set using speech data. In the proposed method, we first extract variable-length phoneme-level segments from broadcast speech signals, and convert them to fixed-length latent vectors using an long short-term memory (LSTM) classifier. We then cluster acoustically similar latent vectors and build a new phone set by choosing the number of clusters with the lowest Davies-Bouldin index. We also update the lexicon of the speech recognizer by choosing the pronunciation sequence of each word with the highest conditional probability. In order to analyze the acoustic characteristics of the new phone set, we visualize its spectral patterns and segment duration. Through speech recognition experiments using a larger training data set than our own previous work, we confirm that the new phone set yields better performance than the conventional phoneme-based and grapheme-based units in both spontaneous speech recognition and read speech recognition.

Coding Method of Variable Threshold Dual Rate ADPCM Speech Considering the Background Noise (배경 잡음환경에서 가변 임계값에 의한 Dual Rate ADPCM 음성 부호화 기법)

  • 한경호
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.17 no.6
    • /
    • pp.154-159
    • /
    • 2003
  • In this paper, we proposed variable threshold dual rate ADPCM coding method which adapts two coding rates of the standard ADPCM of ITU G.726 for speech quality improvement at a comparably low coding rates. The ZCR(Zero Crossing Rate) is computed for speecd data and under the noisy environment, noise data dominant region showed higher ZCR and speech data dominant region showed lower ZCR. The speech data with the higher ZCR is encoded by low coding rate for reduced coded data and the speech data with the lower ZCR is encoded by high coding rate for speech quality improvements. For coded data, 2 bits are assigned for low coding rate of 16[Kbps] and 5 bits are is assigned for high coding rate of 40[Kbps]. Through the simulation, the proposed idea is evaluated and shown that the variable dual rate ADPCM coding technique shows the qood speech quality at low coding rate.

An Experimental Study of Korean Dialectal Speech (한국어 방언 음성의 실험적 연구)

  • Kim, Hyun-Gi;Choi, Young-Sook;Kim, Deok-Su
    • Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.49-65
    • /
    • 2006
  • Recently, several theories on the digital speech signal processing expanded the communication boundary between human beings and machines drastically. The aim of this study is to collect dialectal speech in Korea on a large scale and to establish a digital speech data base in order to provide the data base for further research on the Korean dialectal and the creation of value-added network. 528 informants across the country participated in this study. Acoustic characteristics of vowels and consonants are analyzed by Power spectrum and Spectrogram of CSL. Test words were made on the picture cards and letter cards which contained each vowel and each consonant in the initial position of words. Plot formants were depicted on a vowel chart and transitions of diphthongs were compared according to dialectal speech. Spectral times, VOT, VD, and TD were measured on a Spectrogram for stop consonants, and fricative frequency, intensity, and lateral formants (LF1, LF2, LF3) for fricative consonants. Nasal formants (NF1, NF2, NF3) were analyzed for different nasalities of nasal consonants. The acoustic characteristics of dialectal speech showed that young generation speakers did not show distinction between close-mid /e/ and open-mid$/\epsilon/$. The diphthongs /we/ and /wj/ showed simple vowels or diphthongs depending to dialect speech. The sibilant sound /s/ showed the aspiration preceded to fricative noise. Lateral /l/ realized variant /r/ in Kyungsang dialectal speech. The duration of nasal consonants in Chungchong dialectal speech were the longest among the dialects.

  • PDF

A Study af Speech Rate and Fluency in Narmal Speakers (정상 성인의 말속도 및 유창성 연구)

  • Shin, Moon-Ja;Han, Sook-Ja
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.159-168
    • /
    • 2003
  • The purpose of this study was to assess the speech rate, fluency and the type of dysfluencies of normal adults in order to provide a basic data of normal speaking. The number of subjects of this study were 30(14 females and 16 males), and their ages ranged 17 to 36. The rate was measured as syllables per minute (SPM). The speech rates in reading ranged 273-426 with a mean of 348 SPM and in speaking ranges 118-409 (mean=265). The average of their fluencies was 99.1% in reading and 96.9% in speaking. The rater reliability of speech rate in the data assessed by video was very high (r=0.98) and the rater reliability of speech fluency was moderately high (r=0.67). The disfluency types were also analysed from 150 disfluency episodes. Syllable repetition and word interjection were the most common disfluent types.

  • PDF

An Investigation for Design and Implementation of an Integrated Data Management System of Various Speech Corpora (다양한 음성코퍼스의 통합관리시스템의 설계 및 구현에 관한 검토)

  • Hwang Kyunghun;Jeong Changwon;Kim Youngil;Kim Bongwan;Lee Yongju
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.69-72
    • /
    • 2003
  • In this paper, we investigate various factors that are relevant to design and implementation of an integrated management system for various speech corpora. The purpose of this paper is to manage an integrated management system for various kinds of speech corpora necessary for speech research and speech corpora consrtructed in different data formats. In addition, ways are considered to allow users to search with effect for speech corpora that meet various conditions which they want, and to allow them to add with ease corpora that are constructed newly. In order to achieve this goal, we design a global schema for an integrated management of new additional information without changing old speech corpora, and construct a web-based integrated management system based on the scheme that can be accessed without any temporal and spatial restrictions. And we show the steps by which these can be implemented, and describe related future study topics, examining the system.

  • PDF