• Title/Summary/Keyword: 음절수

Search Result 314, Processing Time 0.028 seconds

Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer (말뭉치와 형태소 분석기를 활용한 한국어 자동 띄어쓰기)

  • Shim, Kwangseob
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.68-75
    • /
    • 2015
  • This paper proposes a method for the automatic word spacing of unsegmented Korean sentences. In our method, eojeol monograms are used for word spacing as opposed to the syllable n-grams that have been used in previous studies. The use of a Korean morphological analyzer is limited to the correction of typical word spacing errors. Our method gives a 98.06% syllable accuracy and a 94.15% eojeol recall, when 10-fold cross-validated with the Sejong corpus, after filtering out non-hangul eojeols. The processing rate is 250K eojeols or 1.8 MB per second on a typical personal computer. Syllable accuracy and eojeol recall are related to the size of the eojeol dictionary, better performance is expected with a bigger corpus.

An Effective Segmentation Scheme for Korean Sentence Classification tasks (한국어 문장 분류 태스크에서의 효과적 분절 전략)

  • Kim, Jin-Sung;Kim, Gyeong-Min;Son, Junyoung;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.173-177
    • /
    • 2021
  • 분절을 통한 양질의 입력 자질을 구성하는 것은 언어모델의 문장에 대한 이해도를 높이기 위한 필수적인 단계이다. 분절은 문장의 의미를 이해하는 데 있어 중요한 역할을 하기 때문이다. 따라서, 한국어 문장 분류 태스크를 수행함에 있어 한국어의 특징에 맞는 분절 기법을 선택하는 것은 필수적이다. 명확한 판단 기준 마련을 위해, 우리는 한국어 문장 분류 태스크에서 가장 효과적인 분절 기법이 무엇인지 감성 분석, 자연어 추론, 텍스트 간 의미적 유사성 판단 태스크를 통해 검증한다. 이 때 비교할 분절 기법의 유형 분류 기준은 언어학적 단위에 따라 어절, 형태소, 음절, 자모 네 가지로 설정하며, 분절 기법 외의 다른 실험 환경들은 동일하게 설정하여 분절 기법이 문장 분류 성능에 미치는 영향만을 측정하도록 한다. 실험 결과에 따르면 자모 단위의 분절 기법을 적용한 모델이 평균적으로 가장 높은 성능을 보여주며, 반복 실험 간 편차가 적어 일관적인 성능 결과를 기록함을 확인할 수 있다.

  • PDF

A Postprocessing Method of Korean Character Recognition by Mis-recognized Morphology Presumption (오인식 형태소 추정에 의한 한국어 문자 인식 후처리 기법)

  • Kim, Young-Hun;Lee, Young-Hwa;Lee, Sang-Jo
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.7
    • /
    • pp.46-55
    • /
    • 1999
  • We proposed the new method of postprocessing which not only reduces the frequency of dictionary access using morphological analysis but improve the recognition rate of character recognizer. In this paper, after estimating morphological construction of mis-recognized word using the part of speech that is analyzed, correct presumed mis-recognized morphology. The postprocessing using a morphology unit reduce candidate because of short than word and frequency of dictionary access because there is no need to morphological analysis for candidate. To select right candidate is only necessary to dictionary access. The proposed results show that reduced the frequency of dictionary access to 60% than postprocessing method using a word unit and recognition rate improved from 94% to 97%.

  • PDF

Effects of Articulator-distance and Tense in Phonological Awareness in Korean: The case of Korean Infants and Toddlers (한국어 음운인식에서의 조음거리와 긴장성 자질의 특성 연구: 영·유아를 중심으로)

  • Kim, Choong-Myung
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.8
    • /
    • pp.424-433
    • /
    • 2015
  • This study tried to investigate the differences between auditory preferences for a discrimination study of minimal pairs with the different onset and the same nucleus of a syllable on the basis of articulator-distance in case of Korean infants and toddlers. As a result we found a main effect for articulator-distance and age but not an effect according to the types of phonation especially in terms of tense. Former results are line with the previous studies having reported the order of consonants acquisition based on the places of articulation suggesting that more sensitive responses for the contiguous and different phonemes may lead earlier acquisition for the same place of articulation of the speech sounds. Specifically, bilabial soudns are followed by alveolar and palatal sounds in order. The latter results also showed that tense consonants got a high rate of recognition beside lax consonants according to the age and sex.

Advanced detection of sentence boundaries based on hybrid method (하이브리드 방법을 이용한 개선된 문장경계인식)

  • Lee, Chung-Hee;Jang, Myung-Gil;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.61-66
    • /
    • 2009
  • 본 논문은 다양한 형태의 웹 문서에 적용하기 위해서, 언어의 통계정보 및 후처리 규칙에 기반 하여 개선된 문장경계 인식 기술을 제안한다. 제안한 방법은 구두점 생략 및 띄어쓰기 오류가 빈번한 웹 문서에 적용하기 위해서 문장경계로 사용될 수 있는 모든 음절을 대상으로 학습하여 문장경계 인식을 수행하였고, 문장경계인식 성능을 최대화 하기 위해서 다양한 실험을 통해 최적의 자질 및 학습데이터를 선정하였고, 다양한 기계학습 기반 분류 모델을 비교하여 최적의 분류모델을 선택하였으며, 학습데이터에 의존적인 통계모델의 오류를 규칙에 기반 해서 보정하였다. 성능 실험은 다양한 형태의 문서별 성능 측정을 위해서 문어체와 구어체가 복합적으로 사용된 신문기사와 블로그 문서(평가셋1), 문어체 위주로 구성된 세종말뭉치와 백과사전 본문(평가셋2), 구두점 생략 및 띄어쓰기 오류가 빈번한 웹 사이트의 게시판 글(평가셋3)을 대상으로 성능 측정을 하였다. 성능척도로는 F-measure를 사용하였으며, 구두점만을 대상으로 문장경계 인식 성능을 평가한 결과, 평가셋1에서는 96.5%, 평가셋2에서는 99.4%를 보였는데, 구어체의 문장경계인식이 더 어려움을 알 수 있었다. 평가셋1의 경우에도 규칙으로 후처리한 경우 정확률이 92.1%에서 99.4%로 올라갔으며, 이를 통해 후처리 규칙의 필요성을 알 수 있었다. 최종 성능평가로는 구두점만을 대상으로 학습된 기본 엔진과 모든 문장경계후보를 인식하도록 개선된 엔진을 평가셋3을 사용하여 비교 평가하였고, 기본 엔진(61.1%)에 비해서 개선된 엔진이 32.0% 성능 향상이 있음을 확인함으로써 제안한 방법이 웹 문서에 효과적임을 입증하였다.

  • PDF

A Study on Speech Recognition using Recurrent Neural Networks (회귀신경망을 이용한 음성인식에 관한 연구)

  • 한학용;김주성;허강인
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.62-67
    • /
    • 1999
  • In this paper, we investigates a reliable model of the Predictive Recurrent Neural Network for the speech recognition. Predictive Neural Networks are modeled by syllable units. For the given input syllable, then a model which gives the minimum prediction error is taken as the recognition result. The Predictive Neural Network which has the structure of recurrent network was composed to give the dynamic feature of the speech pattern into the network. We have compared with the recognition ability of the Recurrent Network proposed by Elman and Jordan. ETRI's SAMDORI has been used for the speech DB. In order to find a reliable model of neural networks, the changes of two recognition rates were compared one another in conditions of: (1) changing prediction order and the number of hidden units: and (2) accumulating previous values with self-loop coefficient in its context. The result shows that the optimum prediction order, the number of hidden units, and self-loop coefficient have differently responded according to the structure of neural network used. However, in general, the Jordan's recurrent network shows relatively higher recognition rate than Elman's. The effects of recognition rate on the self-loop coefficient were variable according to the structures of neural network and their values.

  • PDF

Lip and Voice Synchronization with SMS Messages for Mobile 3D Avatar (SMS 메시지에 따른 모바일 3D 아바타의 입술 모양과 음성 동기화)

  • Youn, Jae-Hong;Song, Yong-Gyu;Kim, Eun-Seok;Hur, Gi-Taek
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.682-686
    • /
    • 2006
  • There have been increasing interests in 3D mobile content service with emergence of a terminal equipping with a mobile 3D engine and growth of mobile content market. Mobile 3D Avatar is the most effective product displaying the character of a personalized mobile device user. However, previous studies on the method of expressing 3D Avatar have been mainly focused on natural and realistic expressions according to the change in facial expressions and lip shape of a character in PC based virtual environments. In this paper, we propose a method of synchronizing the lip shape with voice by applying a SMS message received in mobile environments to 3D mobile Avatar. The proposed method enables to realize a natural and effective SMS message reading service of mobile Avatar by disassembling a received message sentence into units of a syllable and then synchronizing the lip shape of 3D Avatar with the corresponding voice.

  • PDF

Phonetic Acoustic Knowledge and Divide And Conquer Based Segmentation Algorithm (음성학적 지식과 DAC 기반 분할 알고리즘)

  • Koo, Chan-Mo;Wang, Gi-Nam
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.215-222
    • /
    • 2002
  • This paper presents a reliable fully automatic labeling system which fits well with languages having well-developed syllables such as in Korean. The ASL System utilize DAC (Divide and Conquer), a control mechanism, based segmentation algorithm to use phonetic and acoustic information with greater efficiency. The segmentation algorithm is to devide speech signals into speechlets which is localized speech signal pieces and to segment each speechlet for speech boundaries. While HMM method has uniform and definite efficiencies, the suggested method gives framework to steadily develope and improve specified acoustic knowledges as a component. Without using statistical method such as HMM, this new method use only phonetic-acoustic information. Therefore, this method has high speed performance, is consistent extending the specific acoustic knowledge component, and can be applied in efficient way. we show experiment result to verify suggested method at the end.

Syllable Recognition of HMM using Segment Dimension Compression (세그먼트 차원압축을 이용한 HMM의 음절인식)

  • Kim, Joo-Sung;Lee, Yang-Woo;Hur, Kang-In;Ahn, Jum-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.40-48
    • /
    • 1996
  • In this paper, a 40 dimensional segment vector with 4 frame and 7 frame width in every monosyllable interval was compressed into a 10, 14, 20 dimensional vector using K-L expansion and neural networks, and these was used to speech recognition feature parameter for CHMM. And we also compared them with CHMM added as feature parameter to the discrete duration time, the regression coefficients and the mixture distribution. In recognition test at 100 monosyllable, recognition rates of CHMM +${\bigtriangleup}$MCEP, CHMM +MIX and CHMM +DD respectively improve 1.4%, 2.36% and 2.78% over 85.19% of CHMM. And those using vector compressed by K-L expansion are less than MCEP + ${\bigtriangleup}$MCEP but those using K-L + MCEP, K-L + ${\bigtriangleup}$MCEP are almost same. Neural networks reflect more the speech dynamic variety than K-L expansion because they use the sigmoid function for the non-linear transform. Recognition rates using vector compressed by neural networks are higher than those using of K-L expansion and other methods.

  • PDF

Similar Question Search System for online Q&A for the Korean Language Based on Topic Classification (온라인가나다를 위한 주제 분류 기반 유사 질문 검색 시스템)

  • Mun, Jung-Min;Song, Yeong-Ho;Jin, Ji-Hwan;Lee, Hyun-Seob;Lee, Hyun Ah
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.3
    • /
    • pp.263-278
    • /
    • 2015
  • Online Q&A for the National Institute of the Korean Language provides expert's answers for questions about the Korean language, in which many similar questions are repeatedly posted like other Q&A boards. So, if a system automatically finds questions that are similar to a user's question, it can immediately provide users with recommendable answers to their question and prevent experts from wasting time to answer to similar questions repeatedly. In this paper, we set 5 classes of questions based on its topic which are frequently asked, and propose to classify questions to those classes. Our system searches similar questions by combining topic similarity, vector similarity and sequence similarity. Experiment shows that our method improves search correctness with topic classification. In experiment, Mean Reciprocal Rank(MRR) of our system is 0.756, and precision for the first result is 68.31% and precision for top five results is 87.32%.