Search | Korea Science

Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data (음향 데이터로부터 얻은 확장된 음소 단위를 이용한 한국어 자유발화 음성인식기의 성능)

Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.11 no.3
- /
- pp.39-47
- /
- 2019
We propose a method to improve the performance of spontaneous speech recognizers by extending their phone set using speech data. In the proposed method, we first extract variable-length phoneme-level segments from broadcast speech signals, and convert them to fixed-length latent vectors using an long short-term memory (LSTM) classifier. We then cluster acoustically similar latent vectors and build a new phone set by choosing the number of clusters with the lowest Davies-Bouldin index. We also update the lexicon of the speech recognizer by choosing the pronunciation sequence of each word with the highest conditional probability. In order to analyze the acoustic characteristics of the new phone set, we visualize its spectral patterns and segment duration. Through speech recognition experiments using a larger training data set than our own previous work, we confirm that the new phone set yields better performance than the conventional phoneme-based and grapheme-based units in both spontaneous speech recognition and read speech recognition.
https://doi.org/10.13064/KSSS.2019.11.3.039 인용 PDF KSCI

A Study on Spoken Digits Analysis and Recognition (숫자음 분석과 인식에 관한 연구)

김득수;황철준
- Journal of Korea Society of Industrial Information Systems
- /
- v.6 no.3
- /
- pp.107-114
- /
- 2001
This paper describes Connected Digit Recognition with Considering Acoustic Feature in Korea. The recognition rate of connected digit is usually lower than word recognition. Therefore, speech feature parameter and acoustic feature are employed to make robust model for digit, and we could confirm the effect of Considering. Acoustic Feature throughout the experience of recognition. We used KLE 4 connected digit as database and 19 continuous distributed HMM as PLUs(Phoneme Like Units) using phonetical rules. For recognition experience, we have tested two cases. The first case, we used usual method like using Mel-Cepstrum and Regressive Coefficient for constructing phoneme model. The second case, we used expanded feature parameter and acoustic feature for constructing phoneme model. In both case, we employed OPDP(One Pass Dynamic Programming) and FSA(Finite State Automata) for recognition tests. When appling FSN for recognition, we applied various acoustic features. As the result, we could get 55.4% recognition rate for Mel-Cepstrum, and 67.4% for Mel-Cepstrum and Regressive Coefficient. Also, we could get 74.3% recognition rate for expanded feature parameter, and 75.4% for applying acoustic feature. Since, the case of applying acoustic feature got better result than former method, we could make certain that suggested method is effective for connected digit recognition in korean.
PDF

Analysis of Feature Parameter Variation for Korean Digit Telephone Speech according to Channel Distortion and Recognition Experiment (한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석 및 인식실험)

Jung Sung-Yun;Son Jong-Mok;Kim Min-Sung;Bae Keun-Sung
- MALSORI
- /
- no.43
- /
- pp.179-188
- /
- 2002
Improving the recognition performance of connected digit telephone speech still remains a problem to be solved. As a basic study for it, this paper analyzes the variation of feature parameters of Korean digit telephone speech according to channel distortion. As a feature parameter for analysis and recognition MFCC is used. To analyze the effect of telephone channel distortion depending on each call, MFCCs are first obtained from the connected digit telephone speech for each phoneme included in the Korean digit. Then CMN, RTCN, and RASTA are applied to the MFCC as channel compensation techniques. Using the feature parameters of MFCC, MFCC+CMN, MFCC+RTCN, and MFCC+RASTA, variances of phonemes are analyzed and recognition experiments are done for each case. Experimental results are discussed with our findings and discussions
PDF

A study on the recognition system of Korean phenemes using filter-Bank analysis (필터뱅크 분석법을 사용한 한국어 음소의 인식에 관한 연구)

남문현;주상규
- 제어로봇시스템학회:학술대회논문집
- /
- 1987.10b
- /
- pp.473-478
- /
- 1987
The purpose of this study is to design a phoneme-class recognition system for Korean language using filter-bank analysis and zero crossing rate method. First, the speech signals are separated in 16 bandpass filters to obtain short-time spectrum of speech signals, and digitized by 16-ch A/D converter. And then, with the set of features which extracted from patterns of ratios of each channel energy level to overall energy level, the decision rules are made for recognize unknown speech signal. In this experiment, the recognition rate was about 93.1 percent for 7 vowels under multitalker environment and 74.4 percent for 10 initial sounds at single speaker.
PDF

認知建枸主義教學說計在漢語發音教育中的必要性

Lee, Seon-Hui
- 중국학논총
- /
- no.66
- /
- pp.85-103
- /
- 2020
We use prototypes (also known as referent in semiotics) when we understand the outside world. Different language users use different prototypes to decode the same sound. When we learn Chinese language as a foreign language, during it's sound perceptual process, Korean learners' target language prototypes are different from Chinese native speakers'. The purpose of the paper is to examine the theory of speech perception and the theory of constructivism teaching, and to suggest to the Chinese language teachers to have Cunstructivist approach while they design there teaching course. For this, we concerned three things: First is to review speech perception theory and constructivism teaching theory. Second based on the preceding study, we review that learner's prototypes are different from Chinese native speaker and this cause the error of listening and pronunciation. Finally, we introduced two simple speech visualization programs developed to help us learn pronunciation.

A Study on Phonemicization in French Abbreviation (불어 축소어의 음소화 연구)

Ko, Kwang-Jin;Lee, Jung-Won
- Speech Sciences
- /
- v.8 no.3
- /
- pp.105-113
- /
- 2001
The abbreviation (specially, an acronym) are used more nowadays. However we are using them carelessly unknowing that there are some reducing patterns. In this paper, we will first analyse the right oralization and the phonemicization of abbreviation on the basis of the group types. Then, we will propose necessary and sufficient conditions to recognize how to read or pronounce the acronyms and in which way, when they are converted from the text to the speech. We have limited the use of acronym to the graphem-phoneme relations, and the diversity of the usage to minimized, and therefore we could define clearly the characteristics of the phonetic chains. In conclusion, we could find that there are more phonemicization in producing acronyms with these phonetic chains characteristics, and these phonetic based acronyms are increasingly used in the field of aviation and medicine.
PDF

The influence of task demands on the preparation of spoken word production: Evidence from Korean

Choi, Tae-Hwan;Oh, Sujin;Han, Jeong-Im
- Phonetics and Speech Sciences
- /
- v.9 no.4
- /
- pp.1-7
- /
- 2017
It was shown in speech production studies that the preparation unit of spoken word production is language particular, such as onset phonemes for English and Dutch, syllables for Mandarin Chinese, and morae for Japanese. However, there have been inconsistent results on whether the onset phoneme is a planning unit of spoken word production in Korean. In this study, two sets of experiments investigated possible influences of task demands on the phonological preparation in native Korean adults, namely, implicit priming and word naming with the form preparation paradigm. Only the word naming task, but not the implicit priming task, showed a significant onset priming effect, even though there were significant syllable priming effects in both tasks. Following the attentional theory ($O^{\prime}S{\acute{e}}aghdha$ & Frazer, 2014), these results suggest that task demands might play a role in the absence/presence of onset priming effects in Korean. Native Korean speakers could maintain their attention to the shared onset phonemes in word naming, which is not very demanding, while they have difficulties in allocating their attention to such units in a more cognitive-demanding implicit priming, even though both tasks involve accessing phonological codes. These findings demonstrate that there are cross-linguistic differences in the first selectable unit in preparation of spoken word production, but within a single language, the preparation unit might not be immutable.
https://doi.org/10.13064/KSSS.2017.9.4.001 인용 PDF KSCI

Comparison of the pronunciation of word-initial liquids between generations in Korean (세대 간 어두 유음의 발음 양상 비교)

Yun, Eunmi;Sim, Hyeran;Park, Seegyoon;Kim, Hyungi;Kang, Jinseok
- Phonetics and Speech Sciences
- /
- v.9 no.3
- /
- pp.7-15
- /
- 2017
The purpose of this study was to investigate the different aspects of word-initial liquid sounds in Korean according to generations. Five women in their 50s and seven in their 20s participated in the experiment. We examined FL (formant of liquids) and voice sustained time by using Praat software. Three English native speakers were asked to judge the Korean speakers' recorded speech samples for marking [l] or [r] using evaluation sheet. The results of the two experiments revealed three important aspects. First, there was a statistically significant difference between the two groups in the FL of the words 'racket' and 'ruby.' Second, we found statistically significant differences in 'rhythm', 'ruby' and 'litter' from the measurement of the duration of the acoustic data. Third, there was no difference in pronunciation between the two groups according to the phonemes of the original language. The results of this study showed that it is difficult to say that the duration of word-initial liquids and the phoneme difference of the original language are indicators to distinguish the word-initial liquids between generations. Also, it was seen that the pronunciation of Korean word-initial liquid sounds varied across generations.
https://doi.org/10.13064/KSSS.2017.9.3.007 인용 PDF KSCI

An acoustic study of word-timing with references to Korean (한국어 분류에 관한 음향음성학적 연구)

김대원
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06c
- /
- pp.323-327
- /
- 1994
There have been three contrastive claims over the classification of Korean. To answer the classification question, timing variables which would determine the durations of syllable, word and foot were investigated with various words either in isolation or in sentence contexts using Soundcoup/16 on Macintosh P.C., and a total of 284 utterances, obtained from six Korean speakers, were used. It was found 1) that the durational pattern for words tended to maintain in utterances, regardless of position , subjects and dialects 2) that the syllable duration was determined both by the types of phoneme and by the number of phonemes, the word duration both by the syllable complexity and by the number of syllables, and the foot duration by the word complexity, 3) that there was a constractive relationship between foot length in syllables and foot duration and 4) that the foot duration varied generally with word complexity if the same word did not occur both in the first foot and in the second foot. On the basis of these, it was concluded that Korean is a word timed language where, all else being equal, including tempo, emphasis, etc., the inherent durational pattern for words tends to maintain in utterances. The main difference between stress timing, syllable timing and word timing were also discussed.
PDF

Initial-syllable lengthening of an utterance-internal phrase in Korean

Yun, Ilsung
- Phonetics and Speech Sciences
- /
- v.6 no.2
- /
- pp.141-151
- /
- 2014
This study reports anti-hierarchical initial-syllable lengthening of an utterance-internal phrase in Korean. That is, the phrase-initial syllable (e.g., /a/ of "apa-do" or /ma/ of "mapa-do") starting with a voiced phoneme (i.e., vowels or voiced consonants) manifests itself as significantly longer when it is preceded by another phrase without a pause than when it leads an utterance or follows a pause utterance-internally. The phenomenon was examined with regard to two other factors: (1) tempo and (2) tenseness of the consonant (/p, $p^{\prime}$, $p^h$/) following the target syllable /a/. First, the effect of tempo on initial lengthening was not significant. Apart from the statistical significance, however, a tendency was observed, i.e., the slower the tempo is, the greater the lengthening. By contrast, the faster the tempo is, the higher the ratio (%) of lengthening. Second, contrary to our expectations, initial-syllable lengthening was even greater before tense stops /$p^{\prime}$, $p^h$/ than before lax stop /p/ regardless of tempo, and it was remarkable when it comes to the ratio (%), which means that initial lengthening is free of the pre-consonantal vowel shortening effect. Final-syllable lengthening is a pre-boundary marker, while the initial-syllable lengthening is regarded as a post-boundary marker of a phrase.
https://doi.org/10.13064/KSSS.2014.6.2.141 인용 PDF KSCI

Search Result 47, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)