• Title/Summary/Keyword: 음절수

Search Result 314, Processing Time 0.023 seconds

A Stochastic Word-Spacing System Based on Word Category-Pattern (어절 내의 형태소 범주 패턴에 기반한 통계적 자동 띄어쓰기 시스템)

  • Kang, Mi-Young;Jung, Sung-Won;Kwon, Hyuk-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.11
    • /
    • pp.965-978
    • /
    • 2006
  • This paper implements an automatic Korean word-spacing system based on word-recognition using morpheme unigrams and the pattern that the categories of those morpheme unigrams share within a candidate word. Although previous work on Korean word-spacing models has produced the advantages of easy construction and time efficiency, there still remain problems, such as data sparseness and critical memory size, which arise from the morpho-typological characteristics of Korean. In order to cope with both problems, our implementation uses the stochastic information of morpheme unigrams, and their category patterns, instead of word unigrams. A word's probability in a sentence is obtained based on morpheme probability and the weight for the morpheme's category within the category pattern of the candidate word. The category weights are trained so as to minimize the error means between the observed probabilities of words and those estimated by words' individual-morphemes' probabilities weighted according to their categories' powers in a given word's category pattern.

Performance Comparison of Korean Dialect Classification Models Based on Acoustic Features

  • Kim, Young Kook;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.37-43
    • /
    • 2021
  • Using the acoustic features of speech, important social and linguistic information about the speaker can be obtained, and one of the key features is the dialect. A speaker's use of a dialect is a major barrier to interaction with a computer. Dialects can be distinguished at various levels such as phonemes, syllables, words, phrases, and sentences, but it is difficult to distinguish dialects by identifying them one by one. Therefore, in this paper, we propose a lightweight Korean dialect classification model using only MFCC among the features of speech data. We study the optimal method to utilize MFCC features through Korean conversational voice data, and compare the classification performance of five Korean dialects in Gyeonggi/Seoul, Gangwon, Chungcheong, Jeolla, and Gyeongsang in eight machine learning and deep learning classification models. The performance of most classification models was improved by normalizing the MFCC, and the accuracy was improved by 1.07% and F1-score by 2.04% compared to the best performance of the classification model before normalizing the MFCC.

Hanja Information in the Entries of Korean Unabridged Dictionary (국어대사전의 표제어에 나타나는 한자 정보)

  • Kim, Cheol-Su
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.4
    • /
    • pp.438-446
    • /
    • 2010
  • For language information processing that includes both Hangul and Hanja, an electronic dictionary supporting Hangul and Hanja simultaneously is necessary. This paper examined statistical information on Hanja entries of Korean Unabridged Dictionary such as the number of entries that include Hanja based on the KSC-5601 character set, the frequency of the pronunciation and meaning of each character of Hanja included in the entries, the frequency per part of speech of Hanja in entries and the average number of Hanja characters per entry. At least one or more of Hanja characters appear in 303,951 entries out of 440,594, accounting for 68.99% of the total. 858,595 characters of Hanja are included in the 440,594 entries, which is 1.95 Hanja characters per entry. As the average syllable length of the entries is 3.56 and the average count of the Hanja characters per entry is 1.96, it can be said that 54.7% of all the characters of the entries are in Hanja. Among 4,888 Hanja character codes, 4,660 are used once or more, whereas 228 Hanja codes never appear in any entry. There were 5 characters which appear more than 4,000 times. A total of 858,595 Hanja characters used in all the entries correspond to 471 Hangeul codes.

A Study on Recognition Units and Methods to Align Training Data for Korean Speech Recognition) (한국어 인식을 위한 인식 단위와 학습 데이터 분류 방법에 대한 연구)

  • 황영수
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.2
    • /
    • pp.40-45
    • /
    • 2003
  • This is the study on recognition units and segmentation of phonemes. In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition unit. In this paper, we study on the proper recognition units and segmentation of phonemes for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of the case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And recognizer using manually-aligned training data is a little superior to that using automatically-aligned training data. Also, the recognition rate of the case in which the bipbone is used as the recognition unit is better than that of the case in which the mono-Phoneme is used.

  • PDF

Error Correction Methode Improve System using Out-of Vocabulary Rejection (미등록어 거절을 이용한 오류 보정 방법 개선 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.10 no.8
    • /
    • pp.173-178
    • /
    • 2012
  • In the generated model for the recognition vocabulary, tri-phones which is not make preparations are produced. Therefore this model does not generate an initial estimate of parameter words, and the system can not configure the model appear as disadvantages. As a result, the sophistication of the Gaussian model is fall will degrade recognition. In this system, we propose the error correction system using out-of vocabulary rejection algorithm. When the systems are creating a vocabulary recognition model, recognition rates are improved to refuse the vocabulary which is not registered. In addition, this system is seized the lexical analysis and meaning using probability distributions, and this system deactivates the string before phoneme change was applied. System analysis determine the rate of error correction using phoneme similarity rate and reliability, system performance comparison as a result of error correction rate improve represent 2.8% by method using error patterns, fault patterns, meaning patterns.

A Text Processing Method for Devanagari Scripts in Andriod (안드로이드에서 힌디어 텍스트 처리 방법)

  • Kim, Jae-Hyeok;Maeng, Seung-Ryol
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.560-569
    • /
    • 2011
  • In this paper, we propose a text processing method for Hindi characters, Devanagari scripts, in the Android. The key points of the text processing are to device automata, which define the combining rules of alphabets into a set of syllables, and to implement a font rendering engine, which retrieves and displays the glyph images corresponding to specific characters. In general, an automaton depends on the type and the number of characters. For the soft-keyboard, we designed the automata with 14 consonants and 34 vowels based on Unicode. Finally, a combined syllable is converted into a glyph index using the mapping table, used as a handle to load its glyph image. According to the multi-lingual framework of Freetype font engine, Dvanagari scripts can be supported in the system level by appending the implementation of our method to the font engine as the Hindi module. The proposed method is verified through a simple message system.

A Study on Recognition Units for Korean Speech Recognition (한국어 분절음 인식을 위한 인식 단위에 대한 연구)

  • ;;Michael W. Macon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.6
    • /
    • pp.47-52
    • /
    • 2000
  • In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition mit. In this paper, we study on the proper recognition units for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of the case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And also, the recognition rate of the case in which the biphone is used as the recognition unit is better than that of the case in which the mono-phoneme is used.

  • PDF

A Study on the English Pronunciation for English-related Industry (교육산업 활성화를 위한 영어발음 연구)

  • Park, Hee-Suk
    • Journal of Convergence for Information Technology
    • /
    • v.8 no.1
    • /
    • pp.37-42
    • /
    • 2018
  • This study focuses on investigating and comparing the lengths of the five words, vowels, and the ratio of the length of vowels to that of words among the Korean college students with the English native speaker. English sentences were read and recorded by Korean subjects to do this experiment. The vowel lengths were measured from a sound spectrogram, the Praat software program, and these data were analyzed through statistical analysis. I could easily tell that there were differences between the groups and they were significant. In the English front low vowel /${\ae}$/, I was able to find out that native subjects pronounced differently from Korean subjects, and the differences were significant. However, the pronunciation of the English diphthong /ai/, native subjects pronounced significantly shorter than Korean subjects.

A Study on the Self-voice Suppression Algorithm in a ZigBee CROS Hearing Aid (지그비 크로스 보청기에서의 자기음성 억제 알고리즘 연구)

  • Im, Won-Jin;Goh, Young-Hwan;Jeon, Yu-Yong;Kil, Se-Kee;Yoon, Kwang-Sub;Lee, Sang-Min
    • Journal of IKEEE
    • /
    • v.13 no.3
    • /
    • pp.62-71
    • /
    • 2009
  • In this study, we developed a wireless CROS(contralateral routing of signal) hearing aid for unilateral impaired people. CROS hearing aid takes sound from an ear with poorer hearing and transmit to another ear with better hearing. Generally, the self-voice delivered through the receiver of CROS hearing aid can be very loud. It is hard to perceive target speech because of loud self-voice. To compensate it, a self-voice suppression algorithm has been developed. we performed SDT(speech discrimination test) for evaluation of the self-voice suppression algorithm. One-syllable words was used as test speech and recorded with self-voice at a 1m distance. As the results, SDT score was improved about 11% when the self-voice suppression algorithm was processed. It is verified that the self-voice suppression algorithm helps speech perception at a time to communicate with others.

  • PDF

Postprocessing of A Speech Recognition using the Morphological Anlaysis Technique (형태소 분석 기법을 이용한 음성 인식 후처리)

  • 박미성;김미진;김계성;김성규;이문희;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.4
    • /
    • pp.65-77
    • /
    • 1999
  • There are two problems which will be processed to graft a continuous speech recognition results into natural language processing technique. First, the speaking's unit isn't consistent with text's spacing unit. Second, when it is to be pronounced the phonological alternation phenomena occur inside morphemes or among morphemes. In this paper, we implement the postprocessing system of a continuous speech recognition that above all, solve two problems using the eo-jeol generator and syllable recoveror and morphologically analyze the generated results and then correct the failed results through the corrector. Our system experiments with two kinds of speech corpus, i.e., a primary school text book and editorial corpus. The successful percentage of the former is 93.72%, that of the latter is 92.26%. As results of experiment, we verified that our system is stable regardless the sorts of corpus.

  • PDF