Search | Korea Science

A Study on Recognition Units and Methods to Align Training Data for Korean Speech Recognition) (한국어 인식을 위한 인식 단위와 학습 데이터 분류 방법에 대한 연구)

황영수
- Journal of the Institute of Convergence Signal Processing
- /
- v.4 no.2
- /
- pp.40-45
- /
- 2003
This is the study on recognition units and segmentation of phonemes. In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition unit. In this paper, we study on the proper recognition units and segmentation of phonemes for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of the case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And recognizer using manually-aligned training data is a little superior to that using automatically-aligned training data. Also, the recognition rate of the case in which the bipbone is used as the recognition unit is better than that of the case in which the mono-Phoneme is used.
PDF

The Relationship Between Children's Reading Ability of Environmental Print and Phonological Awareness (유아의 환경인쇄물 읽기 능력과 음운론적 인식 능력 간의 관계)

Kim, Hyo Jin;Son, Seung Hee;Rha, Jong Hae
- Korean Journal of Childcare and Education
- /
- v.9 no.6
- /
- pp.107-127
- /
- 2013
The purpose of this study was to investigate the differences in children's reading abilities of environmental print and phonological awareness by children's age and the relationship between children's reading abilities of environmental print and phonological awareness. The subjects were 90 children, 3 to 4 years of age. The Children's Reading Abilities of Environmental Print Scale (CRAEPS) developed by Son (2012) and Phonological Awareness Scale (PAS) revised by Choi (2007) were used to measure children's reading ability of environmental prints and phonological awareness. The results of this study were as follows: Firstly, 4-year-olds performed significantly better than 3-year-olds on the environmental print reading tasks. Also, 4 year-olds scored significantly higher than 3-year-olds in syllable counting, syllable deletion, and phoneme substitution. Secondly, children's scores on the environmental print reading tasks were positively correlated with phonological awareness. In other words, the 3-year-olds who could read environmental prints better got higher scores in syllable counting and the 4-year-olds who could read environmental prints better got higher scores in syllable counting, syllable deletion, and phoneme substitution.
https://doi.org/10.14698/jkcce.2013.9.6.107 인용 PDF KSCI

Korean Word Recognition Using Diphone- Level Hidden Markov Model (Diphone 단위 의 hidden Markov model을 이용한 한국어 단어 인식)

Park, Hyun-Sang;Un, Chong-Kwan;Park, Yong-Kyu;Kwon, Oh-Wook
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.1
- /
- pp.14-23
- /
- 1994
In this paper, speech units appropriate for recognition of Korean language have been studied. For better speech recognition, co-articulatory effects within an utterance should be considered in the selection of a recognition unit. One way to model such effects is to use larger units of speech. It has been found that diphone is a good recognition unit because it can model transitional legions explicitly. When diphone is used, stationary phoneme models may be inserted between diphones. Computer simulation for isolated word recognition was done with 7 word database spoken by seven male speakers. Best performance was obtained when transition regions between phonemes were modeled by two-state HMM's and stationary phoneme regions by one-state HMM's excluding /b/, /d/, and /g/. By merging rarely occurring diphone units, the recognition rate was increased from $93.98\%$ to $96.29\%$. In addition, a local interpolation technique was used to smooth a poorly-modeled HMM with a well-trained HMM. With this technique we could get the recognition rate of $97.22\%$ after merging some diphone units.
PDF

Weighted Disassemble-based Correction Method to Improve Recognition Rates of Korean Text in Signboard Images (간판영상에서 한글 인식 성능향상을 위한 가중치 기반 음소 단위 분할 교정)

Lee, Myung-Hun;Yang, Hyung-Jeong;Kim, Soo-Hyung;Lee, Guee-Sang;Kim, Sun-Hee
- The Journal of the Korea Contents Association
- /
- v.12 no.2
- /
- pp.105-115
- /
- 2012
In this paper, we propose a correction method using phoneme unit segmentation to solve misrecognition of Korean Texts in signboard images using weighted Disassemble Levenshtein Distance. The proposed method calculates distances of recognized texts which are segmented into phoneme units and detects the best matched texts from signboard text database. For verifying the efficiency of the proposed method, a database dictionary is built using 1.3 million words of nationwide signboard through removing duplicated words. We compared the proposed method to Levenshtein Distance and Disassemble Levenshtein Distance which are common representative text string comparison algorithms. As a result, the proposed method based on weighted Disassemble Levenshtein Distance represents an improvement in recognition rates 29.85% and 6% on average compared to that of conventional methods, respectively.
https://doi.org/10.5392/JKCA.2012.12.02.105 인용 PDF KSCI

Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data (음향 데이터로부터 얻은 확장된 음소 단위를 이용한 한국어 자유발화 음성인식기의 성능)

Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.11 no.3
- /
- pp.39-47
- /
- 2019
We propose a method to improve the performance of spontaneous speech recognizers by extending their phone set using speech data. In the proposed method, we first extract variable-length phoneme-level segments from broadcast speech signals, and convert them to fixed-length latent vectors using an long short-term memory (LSTM) classifier. We then cluster acoustically similar latent vectors and build a new phone set by choosing the number of clusters with the lowest Davies-Bouldin index. We also update the lexicon of the speech recognizer by choosing the pronunciation sequence of each word with the highest conditional probability. In order to analyze the acoustic characteristics of the new phone set, we visualize its spectral patterns and segment duration. Through speech recognition experiments using a larger training data set than our own previous work, we confirm that the new phone set yields better performance than the conventional phoneme-based and grapheme-based units in both spontaneous speech recognition and read speech recognition.
https://doi.org/10.13064/KSSS.2019.11.3.039 인용 PDF KSCI

A Study on Speech Recognition Using the HM-Net Topology Design Algorithm Based on Decision Tree State-clustering (결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구)

정현열;정호열;오세진;황철준;김범국
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.2
- /
- pp.199-210
- /
- 2002
In this paper, we carried out the study on speech recognition using the KM-Net topology design algorithm based on decision tree state-clustering to improve the performance of acoustic models in speech recognition. The Korean has many allophonic and grammatical rules compared to other languages, so we investigate the allophonic variations, which defined the Korean phonetics, and construct the phoneme question set for phonetic decision tree. The basic idea of the HM-Net topology design algorithm is that it has the basic structure of SSS (Successive State Splitting) algorithm and split again the states of the context-dependent acoustic models pre-constructed. That is, it have generated. the phonetic decision tree using the phoneme question sets each the state of models, and have iteratively trained the state sequence of the context-dependent acoustic models using the PDT-SSS (Phonetic Decision Tree-based SSS) algorithm. To verify the effectiveness of the above algorithm we carried out the speech recognition experiments for 452 words of center for Korean language Engineering (KLE452) and 200 sentences of air flight reservation task (YNU200). Experimental results show that the recognition accuracy has progressively improved according to the number of states variations after perform the splitting of states in the phoneme, word and continuous speech recognition experiments respectively. Through the experiments, we have got the average 71.5%, 99.2% of the phoneme, word recognition accuracy when the state number is 2,000, respectively and the average 91.6% of the continuous speech recognition accuracy when the state number is 800. Also we haute carried out the word recognition experiments using the HTK (HMM Too1kit) which is performed the state tying, compared to share the parameters of the HM-Net topology design algorithm. In word recognition experiments, the HM-Net topology design algorithm has an average of 4.0% higher recognition accuracy than the context-dependent acoustic models generated by the HTK implying the effectiveness of it.
PDF KSCI

Dutch Listeners' Perception of Korean Stop Consonants

Choi, Jiyoun
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.89-95
- /
- 2015
We explored Dutch listeners' perception of Korean three-way contrast of fortis, lenis, and aspirated stops. The three Korean stops are all voiceless word-initially, whereas Dutch distinguishes between voiced and voiceless stops, so Korean voiceless stops were expected to be difficult for the Dutch listeners. Among the three Korean stops, fortis stops are phonetically most similar to Dutch voiceless stops, thus they were expected to be the easiest to distinguish for the Dutch listeners. Dutch and Korean listeners carried out a discrimination task using three crucial comparisons, i.e., fortis-lenis, fortis-aspirated, and lenis-aspirated stops. Results showed that discrimination between lenis and aspirated stops was the most difficult among the three comparisons for both Dutch and Korean listeners. As expected, Dutch listeners discriminated fortis from the other stops relatively accurately. It seems likely that Dutch listeners relied heavily on VOT but less on F0 when discriminating between the three Korean stops.
https://doi.org/10.13064/KSSS.2015.7.1.089 인용 PDF KSCI

A Study on Consonant/Vowel/Unvoiced Consonant Phonetic Value Segmentation and Recognition of Korean Isolated Word Speech (한국어 고립 단어 음성의 자음/모음/유성자음 음가 분할 및 인식에 관한 연구)

Lee, Jun-Hwan;Lee, Sang-Beom
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.6
- /
- pp.1964-1972
- /
- 2000
For the Korean language, on acoustics, it creates a different form of phonetic value not a phoneme by its own peculiar property. Therefore, the construction of extended recognition system for understanding Korean language should be created with a study of the Korean rule-based system, before it can be used as post-processing of the Korean recognition system. In this paper, text-based Korean rule-based system featuring Korean peculiar vocal sound changing rule is constructed. and based on the text-based phonetic value result of the system constructed, a preliminary phonetic value segmentation border points with non-uniform blocks are extracted in Korean isolated word speech. Through the way of merge and recognition of the non-uniform blocks between the extracted border points, recognition possibility of Korean voice as the form of the phonetic vale has been investigated.
PDF

A Study on the Efficacy of Teaching English Discourse Intonation: Blended Learning (담화속 영어 억양교육의 효율성에 대한 실험연구: 혼합교수모듈을 중심으로)

Kim, He-Kyung
- Speech Sciences
- /
- v.14 no.3
- /
- pp.31-46
- /
- 2007
This study attempts to investigate that the training of pitch manipulation would help Korean speakers reduce the intonation errors based on the review of many previous studies on Korean speakers' phonetic realization of intonation. The previous studies have indicated that Korean speakers have problems with pitch manipulation in their production of English word stress, sentence stress, and eventually intonation. To train Korean speakers phonetically realize English pitch patterns, a blended learning module was operated for two weeks: face-to-face instruction for six hours and e-learning instruction for three hours in total. This module was designed to help Korean speakers realize pitch as a distinctive phoneme. An acoustic assessment on five Korean female English speakers shows that the training of pitch manipulation helps Korean English speakers reduce the intonation errors indicated in the previous studies reviewed.
PDF

Extra Vowel Addition Produced in Korean Students' English Pronunciation of Word-final Stop Consonants (영어 폐쇄자음 발음 뒤에 나타나는 모음추가 현상)

Hwang, Young-Soon
- Speech Sciences
- /
- v.7 no.4
- /
- pp.169-186
- /
- 2000
This paper aims to confirm the mispronunciation of native Korean students due to the phonetic and phonological system differences between English and Korean, and to find the works-to-do by experiment. Many Korean students tend to differentiate the sounds of word-final stop consonants not by vowel duration or the allophones but by the phoneme of the consonant itself. In English, Stop sounds change through the conditions of the aspirated, unaspirated, or unreleased sounds. But in Korean they are not allophones of phonemes but distinct phonemes. Therefore, many Korean students are apt to add an extra vowel sound /i/ after the final stop consonant in the eve form due to both the unperception of the differences between the phonemes and the allophones of stop consonants, and the influence of the Korean sound-sequence relationship. Since the replacement of the allophones and extra vowel addition does not change the meaning, the importance was almost lost. Nevertheless, this kind of study is essential for the precise learning and the use of the English language.
PDF

Search Result 331, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)