• Title/Summary/Keyword: Speech Rates

Search Result 272, Processing Time 0.022 seconds

Comparing the effects of letter-based and syllable-based speaking rates on the pronunciation assessment of Korean speakers of English (철자 기반과 음절 기반 속도가 한국인 영어 학습자의 발음 평가에 미치는 영향 비교)

  • Hyunsong Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.1-10
    • /
    • 2023
  • This study investigated the relative effectiveness of letter-based versus syllable-based measures of speech rate and articulation rate in predicting the articulation score, prosody fluency, and rating sum using "English speech data of Koreans for education" from AI Hub. We extracted and analyzed 900 utterances from the training data, including three balanced age groups (13, 19, and 26 years old). The study built three models that best predicted the pronunciation assessment scores using linear mixed-effects regression and compared the predicted scores with the actual scores from the validation data (n=180). The correlation coefficients between them were also calculated. The findings revealed that syllable-based measures of speech and articulation rates were more effective than letter-based measures in all three pronunciation assessment categories. The correlation coefficients between the predicted and actual scores ranged from .65 to .68, indicating the models' good predictive power. However, it remains inconclusive whether speech rate or articulation rate is more effective.

A study on the speech feature extraction based on the hearing model (청각 모델에 기초한 음성 특징 추출에 관한 연구)

  • 김바울;윤석현;홍광석;박병철
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.4
    • /
    • pp.131-140
    • /
    • 1996
  • In this paper, we propose the method that extracts the speech feature using the hearing model through signal precessing techniques. The proposed method includes following procedure ; normalization of the short-time speech block by its maximum value, multi-resolution analysis using the discrete wavelet transformation and re-synthesize using thediscrete inverse wavelet transformation, differentiation after analysis and synthesis, full wave rectification and integration. In order to verify the performance of the proposed speech feature in the speech recognition task, korean digita recognition experiments were carried out using both the dTW and the VQ-HMM. The results showed that, in case of using dTW, the recognition rates were 99.79% and 90.33% for speaker-dependent and speaker-independent task respectively and, in case of using VQ-HMM, the rate were 96.5% and 81.5% respectively. And it indicates that the proposed speech feature has the potentials to use as a simple and efficient feature for recognition task.

  • PDF

Improvements on MFCC by Elaboration of the Filter Banks and Windows

  • Lee, Chang-Young
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.131-144
    • /
    • 2007
  • In an effort to improve the performance of mel frequency cepstral coefficients (MFCC), we investigate the effects of varying the parameters for the filter banks and their associated windows on speech recognition rates. Specifically, the mel and bark scales are combined with various types of filter bank windows. Comparison and evaluation of the suggested methods are performed by two independent ways of speech recognition and the Fisher discriminant objective function. It is shown that the Hanning window based on the bark scale yields 28.1% relative performance improvements over the triangular window with the mel scale in speech recognition error rate. Further work on incorporating PCA and/or LDA would be desirable as a postprocessor to MFCC extraction.

  • PDF

Error Correction and Praat Script Tools for the Buckeye Corpus of Conversational Speech (벅아이 코퍼스 오류 수정과 코퍼스 활용을 위한 프랏 스크립트 툴)

  • Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.29-47
    • /
    • 2012
  • The purpose of this paper is to show how to convert the label files of the Buckeye Corpus of Spontaneous Speech [1] into Praat format and to introduce some of the Praat scripts that will enable linguists to study various aspects of spoken American English present in the corpus. During the conversion process, several types of errors were identified and corrected either manually or automatically by the use of scripts. The Praat script tools that have been developed can help extract from the corpus massive amounts of phonetic measures such as the VOT of plosives, the formants of vowels, word frequency information and speech rates that span several consecutive words. The script tools can extract additional information concerning the phonetic environment of the target words or allophones.

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

  • Kiyoung Park;Changhan Oh;Sunghee Dong
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.71-81
    • /
    • 2024
  • Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.

Performance improvement and Realtime implementation in CELP Coder (CELP 보코더의 성능 개선 및 실시간 구현)

  • 정창경
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.199-204
    • /
    • 1994
  • In this paper, we researched abut CELP speech coding algorithm using efficlent pseudo-stochastic block codes, adaptive-codebook and improved fixed-gain codebook. The pseudo-stochastic block codes refer to stochastically populated block codes in which the adjacent codewords in an innovation codebook are non-independent. The adaptive-codebook was made with previous prediction speech data by storage-shift register. This CELP coding algorithm enables the coding of toll quality speech at bit rates from 4.8kbits/s to 9.6 kbits/s. This algorithm was realized TMS320C30 microprocessor in realtime.

  • PDF

A Study on the Classification of the Korean Consonants in the VCV Speech Chain (VCV 연쇄음성상에 존재하는 한국어 자음의 분류에 관한 연구)

  • 최윤석;김기석;김원준;황희영
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.39 no.6
    • /
    • pp.607-615
    • /
    • 1990
  • In this paper, I propose the experimental models to classify the consonants in the Vowel-Consonant-Vowel (VCV) speech chain into four phonemic groups such as nasals, liquids, plosives and the others. To classify the fuzzy patterns like speech, it is necessary to analyze the distribution of acoustic feature of many training data. The classification rules are maximum 4 th order polynomial functions obtained by regression analysis, contributing collectively the result. The final result shows about 87% success rates with the data spoken by one man.

An Implementation of Speech Recognition System for Car's Control (자동차 제어용 음성 인식시스템 구현)

  • 이광석;김현덕
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.3
    • /
    • pp.451-458
    • /
    • 2001
  • In this paper, we propose speech control system for a various control device in the car with real time control speech. A real time speech control system is detected start-end points from speech data processing by A/D conversion, and recognize by one pass dynamic programming method. The results displays a monitor, and transports control data to control interfaces. The HMM model is modeled by a continuous control speech consists of control speech and digit speech for controlling of a various control device in the car The recognition rates is an average 97.3% in case of word & control speech, and is an average 96.3% in case of digit speech.

  • PDF

Would Wernicke's Aphasic Speech Vary with Auditory Comprehension Abilities and/or Lesion Loci?

  • Kim, Hyang-Hee;Lee, Young-Mi;Na, Duk-L.;Chung, Chin-Sang;Lee, Kwang-Ho
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.69-83
    • /
    • 2006
  • Speech characteristics of Wernicke's aphasia are characterized by such errors as empty speech, jargon, paraphasia, filler and others. However, not all the errors can be observed in each patient presumably due to diverse auditory comprehension (AC) abilities and/or lesion loci. The purpose of this study was, thus, to clarify the speech characteristics of Wernicke's aphasics according to the AC levels (i.e., better vs. worse) and lesion loci (i.e., Wernicke's area, WA vs. non-Wernicke's area, NWA). The authors divided 21 Wernicke's aphasic patients into four patient groups based on their AC levels and the lesion loci. The results showed that the four groups differed only in CIU (Correct Information Unit) rate. The patient groups with a better AC ability had higher CIU rates than the groups with a worse AC regardless of the lesion loci (e.g., WA or NWA). Therefore, it was concluded that CIU rate, the differentiating speech variable was most likely related to the AC levels, but not to lesion loci.

  • PDF

Implementation of Variable Threshold Dual Rate ADPCM Speech CODEC Considering the Background Noise (배경잡음을 고려한 가변임계값 Dual Rate ADPCM 음성 CODEC 구현)

  • Yang, Jae-Seok;Han, Kyong-Ho
    • Proceedings of the KIEE Conference
    • /
    • 2000.07d
    • /
    • pp.3166-3168
    • /
    • 2000
  • This paper proposed variable threshold dual rate ADPCM coding method which is modified from the standard ADPCM of ITU G.726 for speech quality improvement. The speech quality of variable threshold dual rate ADPCM is better than single rate ADPCM at noisy environment without increasing the complexity by using ZCR(Zero Crossing Rate). In this case, ZCR is used to divide input signal samples into two categories(noisy & speech). The samples with higher ZCR is categorized as the noisy region and the samples with lower ZCR is categorized as the speech region. Noisy region uses higher threshold value to be compressed by 16Kbps for reduced bit rates and the speech region uses lower threshold value to be compressed by 40Kbps for improved speech quality. Comparing with the conventional ADPCM, which adapts the fixed coding rate. the proposed variable threshold dual rate ADPCM coding method improves noise character without increasing the bit rate. For real time applications, ZCR calculation was considered as a simple method to obtain the background noise information for preprocess of speech analysis such as FFT and the experiment showed that the simple calculation of ZCR can be used without complexity increase. Dual rate ADPCM can decrease the amount of transferred data efficiently without increasing complexity nor reducing speech quality. Therefore result of this paper can be applied for real-time speech application such as the internet phone or VoIP.

  • PDF