Search | Korea Science

Design of a Korean Speech Recognition Platform (한국어 음성인식 플랫폼의 설계)

Kwon Oh-Wook;Kim Hoi-Rin;Yoo Changdong;Kim Bong-Wan;Lee Yong-Ju
- MALSORI
- /
- no.51
- /
- pp.151-165
- /
- 2004
For educational and research purposes, a Korean speech recognition platform is designed. It is based on an object-oriented architecture and can be easily modified so that researchers can readily evaluate the performance of a recognition algorithm of interest. This platform will save development time for many who are interested in speech recognition. The platform includes the following modules: Noise reduction, end-point detection, met-frequency cepstral coefficient (MFCC) and perceptually linear prediction (PLP)-based feature extraction, hidden Markov model (HMM)-based acoustic modeling, n-gram language modeling, n-best search, and Korean language processing. The decoder of the platform can handle both lexical search trees for large vocabulary speech recognition and finite-state networks for small-to-medium vocabulary speech recognition. It performs word-dependent n-best search algorithm with a bigram language model in the first forward search stage and then extracts a word lattice and restores each lattice path with a trigram language model in the second stage.
PDF

Reduction and Frequency Analyses of Vowels and Consonants in the Buckeye Speech Corpus

Yang, Byung-Gon
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.75-83
- /
- 2012
The aims of this study were three. First, to examine the degree of deviation from dictionary prescribed symbols and actual speech made by American English speakers. Second, to measure the frequency of vowel and consonant production of American English speakers. And third, to investigate gender differences in the segmental sounds in a speech corpus. The Buckeye Speech Corpus was recorded by forty American male and female subjects for one hour per subject. The vowels and consonants in both the phonemic and phonetic transcriptions were extracted from the original files of the corpus and their frequencies were obtained using codes of a free software R. Results were as follows: Firstly, the American English speakers produced a reduced number of vowels and consonants in daily conversation. The reduction rate from the dictionary transcriptions to the actual transcriptions was around 38.2%. Secondly, the American English speakers used more front high and back low vowels while three-fourths of the consonants accounted for stops, fricatives, and nasals. This indicates that the segmental inventory has nonlinear frequency distribution in the speech corpus. Thirdly, the two gender groups produced vowels and consonants similarly even though there were a few noticeable differences in their speech. From these results we propose that English teachers consider pronunciation education reflecting the actual speech sounds and that linguists find a way to establish unmarked segmentals from speech corpora.
https://doi.org/10.13064/KSSS.2012.4.3.075 인용 PDF

Speech Quality Measure in a Mobile Communication System Using PLP Cepstral Distance with CMS (심리 음향 켑스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가)

Yun, J.J.;Park, S.W.;Park, Y.C.;Youn, D.H.;Cha, I.H.
- Speech Sciences
- /
- v.6
- /
- pp.163-179
- /
- 1999
For the set up, management and repair of a mobile communication system, continuous estimation of speech quality is required. Speech quality measurement can be conducted by listener's judgement in a subjective test such as MOS (Mean Opinion Score) test. However, this method is laborious, expensive and time-consuming, it is advisable to predict subjective speech quality via objective measures. This paper presents a robust objective speech quality measure, PLP-CMS (Perceptual Linear Predictive-Cepstral Mean Subtraction), which can predict subjective speech quality in mobile communication systems. PLP-CMS has a high correlation with subjective quality owing to PLP (Perceptual Linear Predictive) analysis and shows a robust performance not being influenced by PSTN (Public Switched Telephone Network) channel effects due to CMS (Cepstral Mean Subtraction). To prove the performance of our proposed algorithm, we carried out subjective and objective quality estimation on speech samples which are variously distorted in a real mobile communication system. As a result, we demonstrated that PLP-CMS has a higher correlation with subjective quality than PSQM (Perceptual Speech Quality Measure) and PLP-CD (Perceptual Linear Predictive-Cepstral Distance).
PDF

Acoustic characteristics of Motherese

Shim, Hee-Jeong;Lee, GeonJae;Hwang, JinKyung;Ko, Do-Heung
- Phonetics and Speech Sciences
- /
- v.6 no.4
- /
- pp.189-194
- /
- 2014
Objective: This study aims to investigate the speech rate, the length of a pause, habitual pitch, and voice intensity of motherese. Subjects and Methods: The research participants comprised 20 mothers (mean age 33 years). Speech data were collected and analyzed using the Real-time Pitch software (KayPENTAX(R)). Results: The average speech rate was 5.33 syllables per second without their infant present and 4.26 syllables per second with their infant present. The average pause length was 1.09 s without their infant present and 1.56 s with their infant present. The average habitual pitch was 199.79 Hz without their infant present and 227.15 Hz with their infant present. The average voice loudness was 61.09 dB without their infant present and 64.49 dB with their infant present. Conclusion: This study presented clinical information for efficiently managing the speech therapy issues of infants and children. This includes proper acoustic and phonological information to recommend to main caregivers.
https://doi.org/10.13064/KSSS.2014.6.4.189 인용 PDF KSCI

Preliminary study of the perceptual and acoustic analysis on the speech rate of normal adult: Focusing the differences of the speech rate according to the area (정상 성인 말속도의 청지각적/음향학적 평가에 관한 기초 연구: 지역에 따른 말속도 차이를 중심으로)

Lee, Hyun-Joung
- Phonetics and Speech Sciences
- /
- v.6 no.3
- /
- pp.73-77
- /
- 2014
The purpose of this study is to investigate the differences of the speech rate according to the area in the perceptual and acoustic analysis. This study examines regional variation in overall speech rate and articulation rate across speaking situations (picture description, free conversation and story retelling) with 14 normal adult (7 in Gyeongnam and 7 in Honam area). The result of an experimental investigation shows that the perceptual speech rate differs significantly between two regional varieties of Koreans with a picture description examined here. A group of Honam speakers spoke significantly faster than a group of Gyeongnam speakers. However, the result of the acoustic analysis shows that the speech rate of the two groups did not differ. And there were significant regional differences in the overall speech rate and articulation rate on the other two speaking situation, free conversation and story retelling. It suggest that we have to study perceptual evaluation with regard to the free conversation and story retelling in future research, and based on the results of this study, a variety of researches on the speech rate will be needed on the various conditions, including various area and SLPs who have wider background and experiences. It is necessary for SLPs to train and experience more to assess patients properly and reliably.
https://doi.org/10.13064/KSSS.2014.6.3.073 인용 PDF KSCI

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

Chung, Yong-Joo
- Speech Sciences
- /
- v.15 no.2
- /
- pp.55-65
- /
- 2008
Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.
PDF

The effect of voice quality on speech intelligibility in children with spastic cerebral palsy (경직형 뇌성마비 아동의 음질이 말명료도에 미치는 영향)

Jeong, Pil Yeon;Sim, Hyun Sub
- Phonetics and Speech Sciences
- /
- v.9 no.4
- /
- pp.129-136
- /
- 2017
This study investigates the effect of voice quality on speech intelligibility and the relationship between voice quality and intelligibility for children with spastic CP. We recruited 36 children with spastic CP (mean age 10.43 year, 17 girls, 19 boys, spastic type 34, mixed 2) from a special school and a rehabilitation hospital. Voice samples for the perceptual analysis of voice quality were extracted from a sustained vowel /a/ and were rated on the GRBAS scales by two experienced speech language pathologists. Ten adult subjects with no hearing problems evaluated speech intelligibility for the 37 words listed in the Assessment of Phonology and Articulation for Children on a 7-point interval scale. The children with spastic CP were divided into three groups according to the rated G scores on the GRBAS scales (G1(n)=10, G2(n)=13, G3(n)=13). Analyses of ANCOVA and Pearson correlation showed that there was a significant difference in speech intelligibility among three groups. There was also a significant correlation in G scale (grade), A scale (asthenia), B scale (breathy) score, and speech intelligibility. These findings suggest that poor speech intelligibility of spastic CP might be related to asthenia and breathiness. Vocal intensity should be increased and vocal functioning should be improved for speech therapy to improve speech intelligibility of the children with spastic CP.
https://doi.org/10.13064/KSSS.2017.9.4.129 인용 PDF KSCI

Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease (명료발화와 보통발화에서 파킨슨병환자 음성의 켑스트럼 및 스펙트럼 분석)

Shin, Hee-Baek;Shim, Hee-Jeong;Jung, Hun;Ko, Do-Heung
- Phonetics and Speech Sciences
- /
- v.10 no.2
- /
- pp.77-84
- /
- 2018
The purpose of this study is to examine the acoustic characteristics of Parkinsonian speech, with respect to different utterance conditions, by employing acoustic/auditory-perceptual analysis. The subjects of the study were 15 patients (M=7, F=8) with Parkinson's disease who were asked to read out sentences under different utterance conditions (clear/casual). The sentences read out by each subject were recorded, and the recorded speech was subjected to cepstrum and spectrum analysis using Analysis of Dysphonia in Speech and Voice (ADSV). Additionally, auditory-perceptual evaluation of the recorded speech was conducted with respect to breathiness and loudness. Results indicate that in the case of clear speech, there was a statistically significant increase in the cepstral peak prominence (CPP), and a decrease in the L/H ratio SD (ratio of low to high frequency spectral energy SD) and CPP F0 SD values. In the auditory-perceptual evaluation, a decrease in breathiness and an increase in loudness were noted. Furthermore, CPP was found to be highly correlated to breathiness and loudness. This provides objective evidence of the immediate usefulness of clear speech intervention in improving the voice quality of Parkinsonian speech.
https://doi.org/10.13064/KSSS.2018.10.2.077 인용 PDF KSCI

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

Kim, Jin-Young;Eom, Ki-Wan
- Speech Sciences
- /
- v.2
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

Vowel Space Area and Speech Intelligibility of Children with Cochlear Implants (인공와우이식 아동의 모음공간면적과 말명료도)

Park, Hyemi;Huh, Myungjin
- Phonetics and Speech Sciences
- /
- v.6 no.2
- /
- pp.89-96
- /
- 2014
This study measured speech intelligibility in relation to the vowel space area and the perception of the listener through acoustic analysis of children who had received cochlear implants. It also provided basic data in the evaluation of speech intelligibility by analyzing the correlation between the vowel space area and speech intelligibility. As a research method, the vowel space area was analyzed by obtaining the value of $F_1$, $F_2$ in children three years after receiving cochlear implants, and compared them to normal children by measuring speech intelligibility through interval scaling. A product-moment correlation analysis was conducted to investigate the correlation. Results showed that the vowel space area of the children who had received cochlear implants was significantly different from that of the normal children, though their speech intelligibility showed similar points to those of the normal children. The results of the correlation analysis on the vowel space area and speech intelligibility showed no significant correlation. Therefore, the period of improving intelligibility after receiving cochlear implants and the objective standards of the vowel space area could be established. In addition, the acoustic rating was required to increase the accuracy of the objective measurement in the evaluation of speech intelligibility.
https://doi.org/10.13064/KSSS.2014.6.2.089 인용 PDF KSCI

Search Result 5,286, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)