• Title/Summary/Keyword: Pronunciation Error

Search Result 43, Processing Time 0.021 seconds

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

Correlation analysis of linguistic factors in non-native Korean speech and proficiency evaluation (비원어민 한국어 말하기 숙련도 평가와 평가항목의 상관관계)

  • Yang, Seung Hee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.49-56
    • /
    • 2017
  • Much research attention has been directed to identify how native speakers perceive non-native speakers' oral proficiency. To investigate the generalizability of previous findings, this study examined segmental, phonological, accentual, and temporal correlates of native speakers' evaluation of L2 Korean proficiency produced by learners with various levels and nationalities. Our experiment results show that proficiency ratings by native speakers significantly correlate not only with rate of speech, but also with the segmental accuracies. The influence of segmental errors has the highest correlation with the proficiency of L2 Korean speech. We further verified this finding within substitution, deletion, insertion error rates. Although phonological accuracy was expected to be highly correlated with the proficiency score, it was the least influential measure. Another new finding in this study is that the role of pitch and accent has been underemphasized so far in the non-native Korean speech perception studies. This work will serve as the groundwork for the development of automatic assessment module in Korean CAPT system.

Korean speech recognition based on grapheme (문자소 기반의 한국어 음성인식)

  • Lee, Mun-hak;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.601-606
    • /
    • 2019
  • This paper is a study on speech recognition in the Korean using grapheme unit (Cho-sumg [onset], Jung-sung [nucleus], Jong-sung [coda]). Here we make ASR (Automatic speech recognition) system without G2P (Grapheme to Phoneme) process and show that Deep learning based ASR systems can learn Korean pronunciation rules without G2P process. The proposed model is shown to reduce the word error rate in the presence of sufficient training data.

Japanese and Korean speakers' production of Japanese fricative /s/ and affricate /ts/

  • Yamakawa, Kimiko;Amano, Shigeaki
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.13-19
    • /
    • 2022
  • This study analyzed the pronunciations of Japanese fricative /s/ and affricate /ts/ by 24 Japanese and 40 Korean speakers using the rise and steady+decay durations of their frication part in order to clarify the characteristics of their pronunciations. Discriminant analysis revealed that Japanese speakers' /s/ and /ts/ were well classified by the acoustic boundaries defined by a discriminant function. Using this boundary, Korean speakers' production of /s/ and /ts/ was analyzed. It was found that, in Korean speakers' pronunciation, misclassification of /s/ as /ts/ was more frequent than that of /ts/ as /s/, indicating that both the /s/ and /ts/ distributions shift toward short rise and steady+decay durations. Moreover, their distributions were very similar to those of Korean fricatives and affricates. These results suggest that Korean speakers' classification error might be because of their use of Korean lax and tense fricatives to pronounce Japanese /s/, and Korean lax and tense affricates to pronounce Japanese /ts/.

Hangul Vowel Input System for Electronic Networking Devices (정보통신 단말기를 위한 한글 모음 입력 시스템)

  • Kang Seung-Shik;Hahn Kwang-Soo
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.507-512
    • /
    • 2005
  • There is a limitation of using a small number of input buttons for writing Hangul words on hand-held devices. As a quick and convenient way of implementing Hangul vowels by small number of buttons, we propose a vowel input system in which vowels are fabricated from eight vowels. Our input system supports a fast input speed by making all the diphthong from one or two strokes. It also adopts a multiple input method for diphthong that users can make a diphthong in a user-friendly way of vowel writing formation or pronunciation similarity. Furthermore, we added an error correction functionality for the similar vowels that are caused by vowel harmony rules. When the proposed method is compared to the previous ones, our method outperformed in the input speed and error correction.

Error Correction for Korean Speech Recognition using a LSTM-based Sequence-to-Sequence Model

  • Jin, Hye-won;Lee, A-Hyeon;Chae, Ye-Jin;Park, Su-Hyun;Kang, Yu-Jin;Lee, Soowon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.1-7
    • /
    • 2021
  • Recently, since most of the research on correcting speech recognition errors is based on English, there is not enough research on Korean speech recognition. Compared to English speech recognition, however, Korean speech recognition has many errors due to the linguistic characteristics of Korean language, such as Korean Fortis and Korean Liaison, thus research on Korean speech recognition is needed. Furthermore, earlier works primarily focused on editorial distance algorithms and syllable restoration rules, making it difficult to correct the error types of Korean Fortis and Korean Liaison. In this paper, we propose a context-sensitive post-processing model of speech recognition using a LSTM-based sequence-to-sequence model and Bahdanau attention mechanism to correct Korean speech recognition errors caused by the pronunciation. Experiments showed that by using the model, the speech recognition performance was improved from 64% to 77% for Fortis, 74% to 90% for Liaison, and from 69% to 84% for average recognition than before. Based on the results, it seems possible to apply the proposed model to real-world applications based on speech recognition.

Acoustic model training using self-attention for low-resource speech recognition (저자원 환경의 음성인식을 위한 자기 주의를 활용한 음향 모델 학습)

  • Park, Hosung;Kim, Ji-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.483-489
    • /
    • 2020
  • This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.

Performance Improvement of Connected Digit Recognition by Considering Phonemic Variations in Korean Digit and Speaking Styles (한국어 숫자음의 음운변화 및 화자 발성특성을 고려한 연결숫자 인식의 성능향상)

  • 송명규;김형순
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.401-406
    • /
    • 2002
  • Each Korean digit is composed of only a syllable, so recognizers as well as Korean often have difficulty in recognizing it. When digit strings are pronounced, the original pronunciation of each digit is largely changed due to the co-articulation effect. In addition to these problems, the distortion caused by various channels and noises degrades the recognition performance of Korean connected digit string. This paper dealt with some techniques to improve recognition performance of it, which include defining a set of PLUs by considering phonemic variations in Korean digit and constructing a recognizer to handle speakers various speaking styles. In the speaker-independent connected digit recognition experiments using telephone speech, the proposed techniques with 1-Gaussian/state gave string accuracy of 83.2%, i. e., 7.2% error rate reduction relative to baseline system. With 11-Gaussians/state, we achieved the highest string accuracy of 91.8%, i. e., 4.7% error rate reduction.

Investigation of Etymology of a Word 'Chal(刹)' from Temple and Verification of Fallacy, Circulated in the Buddhist Community (사찰 '찰(刹)'의 어원 규명과 불교계 통용 오류 검증)

  • Lee, Hee-Bong
    • Journal of architectural history
    • /
    • v.32 no.1
    • /
    • pp.47-60
    • /
    • 2023
  • Due to a mistranslation of Sanskrit to Chinese, East Asian Buddhist community misunderstands the original meaning of the fundamental word, 'sachal(寺刹)'. Sanskrit chattra, a parasol on top of a venerated Indian stupa buried with Buddha's sarira, became the symbol of majesty. The Indian stupa was transformed into a pagoda in China, and the highlighted parasol on the summit was transliterated into chaldara(刹多羅), an abbreviation for chal (刹), and finally designated the whole pagoda(塔). Sachal consists with lying low monastery and high-rise pagoda. Tapsa(塔寺), an archaic word of temple, is exactly the same as sachal, because chal means tap, pagoda. However, during the 7th century a Buddhist monk erroneously double-transliterated the Sanskrit 'kshetra,' meaning of land, into the same word as chal, even despite phonetic disaccord. Thereafter, sutra translators followed and copied the error for long centuries. It was the Japanese pioneer scholars that worsen the situation 100 years ago, to publish Sanskrit dictionaries with the errors insisting on phonetic transliteration, though pronunciation of 'kshe-' which is quite different from 'cha-.' Thereafter, upcoming scholars followed their fallacy without any verification. Fallacy of chal, meaning of land, dominates Buddhist community broadly, falling into conviction of collective fixed dogma in East Asia up to now. In the Buddhist community, it is the most important matter to recognize that the same language has become to refer completely different objects due to translation errors. As a research method, searching for corresponding Sanskrit words in translated sutras and dictionaries of Buddhism is predominant. Then, after analyzing the authenticity, the fallacy toward the truth will be corrected.

Analysis of Korean Spontaneous Speech Characteristics for Spoken Dialogue Recognition (대화체 연속음성 인식을 위한 한국어 대화음성 특성 분석)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.330-338
    • /
    • 2002
  • Spontaneous speech is ungrammatical as well as serious phonological variations, which make recognition extremely difficult, compared with read speech. In this paper, for conversational speech recognition, we analyze the transcriptions of the real conversational speech, and then classify the characteristics of conversational speech in the speech recognition aspect. Reflecting these features, we obtain the baseline system for conversational speech recognition. The classification consists of long duration of silence, disfluencies and phonological variations; each of them is classified with similar features. To deal with these characteristics, first, we update silence model and append a filled pause model, a garbage model; second, we append multiple phonetic transcriptions to lexicon for most frequent phonological variations. In our experiments, our baseline morpheme error rate (WER) is 31.65%; we obtain MER reductions such as 2.08% for silence and garbage model, 0.73% for filled pause model, and 0.73% for phonological variations. Finally, we obtain 27.92% MER for conversational speech recognition, which will be used as a baseline for further study.