Search | Korea Science

Sentence design for speech recognition database

Zu Yiqing
- Proceedings of the KSPS conference
- /
- 1996.10a
- /
- pp.472-472
- /
- 1996
The material of database for speech recognition should include phonetic phenomena as much as possible. At the same time, such material should be phonetically compact with low redundancy[1, 2]. The phonetic phenomena in continuous speech is the key problem in speech recognition. This paper describes the processing of a set of sentences collected from the database of 1993 and 1994 "People's Daily"(Chinese newspaper) which consist of news, politics, economics, arts, sports etc.. In those sentences, both phonetic phenometla and sentence patterns are included. In continuous speech, phonemes always appear in the form of allophones which result in the co-articulary effects. The task of designing a speech database should be concerned with both intra-syllabic and inter-syllabic allophone structures. In our experiments, there are 404 syllables, 415 inter-syllabic diphones, 3050 merged inter-syllabic triphones and 2161 merged final-initial structures in read speech. Statistics on the database from "People's Daily" gives and evaluation to all of the possible phonetic structures. In this sentence set, we first consider the phonetic balances among syllables, inter-syllabic diphones, inter-syllabic triphones and semi-syllables with their junctures. The syllabic balances ensure the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. the rest describes the inter-syllabic jucture. The 1560 sentences consist of 96% syllables without tones(the absent syllables are only used in spoken language), 100% inter-syllabic diphones, 67% inter-syllabic triphones(87% of which appears in Peoples' Daily). There are rougWy 17 kinds of sentence patterns which appear in our sentence set. By taking the transitions between syllables into account, the Chinese speech recognition systems have gotten significantly high recognition rates[3, 4]. The following figure shows the process of collecting sentences. [people's Daily Database] -> [segmentation of sentences] -> [segmentation of word group] -> [translate the text in to Pin Yin] -> [statistic phonetic phenomena & select useful paragraph] -> [modify the selected sentences by hand] -> [phonetic compact sentence set]
PDF

Egyptian learners' learnability of Korean phonemes (이집트 한국어 학습자들의 한국어 음소 학습용이성)

Benjamin, Sarah;Lee, Ho-Young;Hwang, Hyosung
- Phonetics and Speech Sciences
- /
- v.11 no.4
- /
- pp.19-33
- /
- 2019
This paper examines the perception of Korean phonemes by Egyptian learners of Korean and presents the learnability gradient of Korean consonants and vowels through High Variability Phonetic Training (HVPT). 50 Egyptian learners of Korean (27 low proficiency learners and 23 high proficiency learners) participated in 10 sessions of HVPT for Korean vowels, word initial and final consonants. Participants were tested on their identification ability of Korean vowels, word initial consonants, and syllable codas before and after the training. The results showed that both low and high proficiency groups did benefit from the training. Low proficiency learners showed a higher improvement rate than high proficiency learners. Based on the HVPT results, a learnability gradient was established to give insights into priorities in teaching Korean sounds to Egyptian learners.
https://doi.org/10.13064/KSSS.2019.11.4.019 인용 PDF KSCI

A Study on the Neural Networks for Korean Phoneme Recognition (한국어 음소 인식을 위한 신경회로망에 관한 연구)

Choi, Young-Bae;Yang, Jin-Woo;Lee, Hyung-Jun;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.1
- /
- pp.5-13
- /
- 1994
This paper presents a study on Neural Networks for Phoneme Recognition and performs the Phoneme Recognition using TDNN (Time Delay Neural Network). Also, this paper proposes training algorithm for speech recognition using neural nets that is a proper to large scale TDNN. Because Phoneme Recognition is indispensable for continuous speech recognition, this paper uses TDNN to get accurate recognition result of phonemes. And this paper proposes new training algorithm that can converge TDNN to an optimal state regardless of the number of phonemes to be recognized. The recognition experiment was performed with new training algorithm for TDNN that combines backpropagation and Cauchy algorithm using stochastic approach. The results of the recognition experiment for three phoneme classes for two speakers show the recognition rates of $98.1\%$. And this paper yielded that the proposed algorithm is an efficient method for higher performance recognition and more reduced convergence time than TDNN.
PDF

A Word Dictionary Structure for the Postprocessing of Hangul Recognition (한글인식 후처리용 단어사전의 기억구조)

;Yoshinao Aoki
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.19 no.9
- /
- pp.1702-1709
- /
- 1994
In the postprocessing of Hangul recognition system, the storage structure of contextual information is an important matter for the recognition rate and speed of the entire system. Trie in general is used to represent the context as word dictionary, but the memory space efficiency of the structure is low. Therefore we propose a new structure for word dictionary that has better space efficiency and the equivalent merits of trie. Because Hangul is a compound language, the language can be represented by phonemes or by characters. In the representation by phonemes(P-mode) the retrieval is fast, but the space efficiency is low. In the representation by characters(C-mode) the space efficiency is high, but the retrieval is slow. In this paper the two representation methods are combined to form a hybrid representation(H-mode). At first an optimal level for the combination is selected by two characteristic curves of node utilization and dispersion. Then the input words are represented with trie structure by P-mode from the first to the optimal level, and the rest are represented with sequentially linked list structure by C-mode. The experimental results for the six kinds of word set show that the proposed structure is more efficient. This result is based on the fact that the retrieval for H-mode is as fast as P-mode and the space efficiency is as good as C-mode.
PDF

The Compensatory Articulation in the Patients with Cleft Palate having Velopharyngeal Insufficiency (구개열로 인한 연인두 폐쇄 부전 환자의 보상조음)

Lee Eun-Kyung;Park Mi-Kyong;Son Young-Ik
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.16 no.2
- /
- pp.118-122
- /
- 2005
Background and Objectives The compensatory articulation not only influences general speech intelligibility, but also prevents precise assessment of the velopharyngeal function. This study was performed to investigate frequently affected phonemes, prevalence and the characteristics of compensatory articulation in the patients with cleft palate having velopharyngeal insufficiency. Material and Method An archival review was taken on 103 cleft palate subjects. Their age ranged from 2.6 to 63 years (mean age of 9.8 years). They were grouped into two : preschool group (n=71) and older patient group (n=32). The prevalence and patterns of compensatory articulation were examined on oral high pressure consonants such as plosives, fricatives and affricates. Results : Compensatory errors were observed in $49.5\%$ of the subjects and were mostly glottal stops with the exception of 4cases who had pharyngeal fricatives in addition to glottal stops. The most frequently substituted phonemes were velar plosives and tense sound. There was no significant difference of prevalence in both groups. However, errors for bilabial and alveolar plosives were more frequently observed in preschool group. Conclusion High prevalence of compensatory articulation observed in both preschool and older age group indicates that their articulation errors tend to remain unless appropriate speech therapy is provided. To improve speech intelligibility of the patients with cleft palate having velopharyngeal insufficiency, it is advisable to address and correct the compensatory articulation errors in their earlier ages.
PDF

Perception and production of Mandarin lexical tones in Korean learners of Mandarin Chinese (중국어를 학습하는 한국어 모국어 화자의 중국어 성조 지각과 산출)

Ko, Sungsil;Choi, Jiyoun
- Phonetics and Speech Sciences
- /
- v.12 no.1
- /
- pp.11-17
- /
- 2020
Non-tonal language speakers may have difficulty learning second language lexical tones. In the present study, we explored this issue with Korean-speaking learners of Mandarin Chinese (i.e., non-tonal first language speakers) by examining their perception and production of Mandarin lexical tones. In the perception experiment, the Korean learners were asked to listen to the tone of each stimulus and assign it to one of four Mandarin lexical tones using the response keys; in the production experiment, the learners provided speech production data for the lexical tones and then their productions were identified by native listeners of Mandarin Chinese. Our results showed that the Korean learners of Mandarin Chinese had difficulty in perceptually distinguishing Tone 2 and Tone 3, with the most frequent production error being the mispronunciation of Tone 3 as Tone 2. We also investigated whether unfamiliar non-native phonemes (i.e., Chinese phonemes) that do not exist in the native language phonemic inventory (i.e., Korean) may hinder the processing of the non-native lexical tones. We found no evidence for such effects, neither for the perception nor for the production of the tones.
https://doi.org/10.13064/KSSS.2020.12.1.011 인용 PDF KSCI

Phonological Awareness in Hearing Impaired Children (청각장애아동의 음운인식능력에 대한 연구)

Park, Sang-Hee;Seok, Dong-Il;Jeong, Ok-Ran
- Speech Sciences
- /
- v.9 no.2
- /
- pp.193-202
- /
- 2002
The purpose of this study is to examine the phonological awareness of hearing impaired children. A number of researches indicate that hearing impaired children have articulation disorders due to their impaired auditory feedback. However, in children who have the ability to distinguish certain phonemes, they sometimes show misarticulation of the phonemes. Phonological awareness refers to recognizing the speech-sound units and their forms in spoken language (Hong, 2001). The subjects who participated in the experiment are composed of four hearing impaired children (3 cochlear implanted children and 1 hearing aided child). Phonological Awareness was evaluated by the test battery developed by Paik et al. (2001). The subtests consisted of rhyme matching, onset matching I II, word initial segmentation and matching I II. If the children asked for retelling, it was retold to a maximum of 4 times. Each item score was 1 point. The results were compared to those of Paik et al. (2001). The results of study were that subject 1 showed superior rhyme matching ability, subjects 2 and 3 fair ability, and subject 4 inferior ability. In onset matching I, all subjects showed inferior ability except for subject 3. Interestingly, subjects 1 showed the lowest onset matching I score. In word initial segmentation and matching I, subjects 1 and 4 showed inferior ability and subjects 2 and 3 showed fair ability. In onset matching II, subject 2 showed the perfect score 10 even though she showed very low score. In word initial segmentation and matching II, only subjects 2 and 3 showed appropriate levels of the skill. The results show that the phonological awareness of hearing impaired children is different from that of normal children.
PDF

An Encryption Algorithm Based on DES or Composition Hangul Syllables (DES에 기반한 조합형 한글 암호 알고리즘)

박근수
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.9 no.3
- /
- pp.63-74
- /
- 1999
In this paper we present a Hangul Encryption Algorithm (HEA) which encrypts composition Hangul syllables into composition Hangul syllables using the non-linear structure of Hangul. Since ciphertexts generated by HEA are displayable characters HEA can be used in applications such as Privacy Enhanced mail (PEM) where ciphertexts should be displayable characters. HEA is based on DES and it can be shown that HEA is as safe as DES against the exhaustive key search differential cryptanalysis and linear cryptanalysis. HEA also has randomness of phonemes of ciphertexts and satisfies plaintext-ciphetext avalanche effect and key-ciphertext avalanche effect.
https://doi.org/10.13089/JKIISC.1999.9.3.63 인용 PDF

A quantitative study on the minimal pair of Korean phonemes: Focused on syllable-initial consonants (한국어 음소 최소대립쌍의 계량언어학적 연구: 초성 자음을 중심으로)

Jung, Jieun
- Phonetics and Speech Sciences
- /
- v.11 no.1
- /
- pp.29-40
- /
- 2019
The paper investigates the minimal pair of Korean phonemes quantitatively. To achieve this goal, I calculated the number of consonant minimal pairs in the syllable-initial position as both raw counts and relative counts, and analyzed the part of speech relations of the two words in the minimal pair. "Urimalsaem" was chosen as the object of this study because it was judged that the minimal pair analysis should be done through a dictionary and it is the largest among Korean dictionaries. The results of the study are summarized as follows. First, there were 153 types of minimal pairs out of 337,135 examples. The ranking of phoneme pairs from highest to lowest was 'ㅅ-ㅈ, ㄱ-ㅅ, ㄱ-ㅈ, ㄱ-ㅂ, ㄱ-ㅎ, ${\ldots}$, ㅆ-ㅋ, ㄸ-ㅋ, ㅉ-ㅋ, ㄹ-ㅃ, ㅃ-ㅋ'. The phonemes that played a major role in the formation of the minimal pair were /ㄱ, ㅅ, ㅈ, ㅂ, ㅊ/, in that order, which showed a high proportion of palatals. The correlation between the raw count of minimal pairs and the relative count of minimal pairs was found to be quite high r=0.937. Second, 87.91% of the minimal pairs shared the part of speech (same syntactic category). The most frequently observed type has been 'noun-noun' pair (70.25%), and 'vowel-vowel' pair (14.77%) was the next ranking. It can be indicated that the minimal pair could be grouped into similar categories in terms of semantics. The results of this study can be useful for various research in Korean linguistics, speech-language pathology, language education, language acquisition, speech synthesis, and artificial intelligence-machine learning as basic data related to Korean phonemes.
https://doi.org/10.13064/KSSS.2019.11.1.029 인용 PDF KSCI

Analysis of Feature Parameter Variation for Korean Digit Telephone Speech according to Channel Distortion and Recognition Experiment (한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석 및 인식실험)

Jung Sung-Yun;Son Jong-Mok;Kim Min-Sung;Bae Keun-Sung
- MALSORI
- /
- no.43
- /
- pp.179-188
- /
- 2002
Improving the recognition performance of connected digit telephone speech still remains a problem to be solved. As a basic study for it, this paper analyzes the variation of feature parameters of Korean digit telephone speech according to channel distortion. As a feature parameter for analysis and recognition MFCC is used. To analyze the effect of telephone channel distortion depending on each call, MFCCs are first obtained from the connected digit telephone speech for each phoneme included in the Korean digit. Then CMN, RTCN, and RASTA are applied to the MFCC as channel compensation techniques. Using the feature parameters of MFCC, MFCC+CMN, MFCC+RTCN, and MFCC+RASTA, variances of phonemes are analyzed and recognition experiments are done for each case. Experimental results are discussed with our findings and discussions
PDF

Search Result 227, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)