• 제목/요약/키워드: 단어길이

Search Result 147, Processing Time 0.024 seconds

Statistical Word Sense Disambiguation based on using Variant Window Size (가변길이 윈도우를 이용한 통계 기반 동형이의어의 중의성 해소)

  • Park, Gi-Tae;Lee, Tae-Hoon;Hwang, So-Hyun;Lee, Hyun Ah
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.40-44
    • /
    • 2012
  • 어휘가 갖는 의미적 중의성은 자연어의 특성 중 하나로 자연어 처리의 정확도를 떨어트리는 요인으로, 이러한 중의성을 해소하기 위해 언어적 규칙과 다양한 기계 학습 모델을 이용한 연구가 지속되고 있다. 의미적 중의성을 가지고 있는 동형이의어의 의미분별을 위해서는 주변 문맥이 가장 중요한 자질이 되며, 자질 정보를 추출하기 위해 사용하는 문맥 창의 크기는 중의성 해소의 성능과 밀접한 연관이 있어 신중히 결정되어야 한다. 본 논문에서는 의미분별과정에 필요한 문맥을 가변적인 크기로 사용하는 가변길이 윈도우 방식을 제안한다. 세종코퍼스의 형태의미분석 말뭉치로 학습하여 12단어 32,735문장에 대해 실험한 결과 용언의 경우 평균 정확도 92.2%로 윈도우를 고정적으로 사용한 경우에 비해 향상된 결과를 보였다.

  • PDF

The Effect of Inter-word Space on Chinese reading: An Eye Movement Study (단어 간 공백이 중국어 글 읽기에 미치는 영향: 안구운동 추적 연구)

  • Han, Mi-ae;Jiang, Xin;Zhao, Weiqi
    • Korean Journal of Cognitive Science
    • /
    • v.29 no.4
    • /
    • pp.243-263
    • /
    • 2018
  • This research investigated whether inter-word spaces, the spaces between words, can affect the efficiency of Korean-speaking CSL(Chinese as a second language) learners in Chinese reading of Korean-speaking's ability to read Chinese. Through eye movement tracking experiments, CSL learners of different proficiency levels(beginning, intermediate, and advanced) and native Chinese readers were asked to read Chinese sentences with and without inter-word spaces. The tests analysed the participants' fixation counts and the time spent in reading each sentences and also between each words. In terms of the fixation counts and time spent between sentences, the results show that there were no significant difference in participants' fixation counts from reading sentences with and without inter-word spaces. The results also prove that reading sentences with inter-word spaces significantly shortened the reading time for both CSL learners and native Chinese readers. Even for the participants' fixation counts and time duration between each words, participants spent significantly less fixation counts and reading time while reading words with inter-word spaces. The results were more prominent and positive in tests conducted with CSL learners of lower proficiency. This research shows that inter-word spaces in Chinese texts can enhance the efficiency of chinese learners' reading ability.

Sign Language Spotting Based on Semi-Markov Conditional Random Field (세미-마르코프 조건 랜덤 필드 기반의 수화 적출)

  • Cho, Seong-Sik;Lee, Seong-Whan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.12
    • /
    • pp.1034-1037
    • /
    • 2009
  • Sign language spotting is the task of detecting the start and end points of signs from continuous data and recognizing the detected signs in the predefined vocabulary. The difficulty with sign language spotting is that instances of signs vary in both motion and shape. Moreover, signs have variable motion in terms of both trajectory and length. Especially, variable sign lengths result in problems with spotting signs in a video sequence, because short signs involve less information and fewer changes than long signs. In this paper, we propose a method for spotting variable lengths signs based on semi-CRF (semi-Markov Conditional Random Field). We performed experiments with ASL (American Sign Language) and KSL (Korean Sign Language) dataset of continuous sign sentences to demonstrate the efficiency of the proposed method. Experimental results show that the proposed method outperforms both HMM and CRF.

Speech Data Collection for korean Speech Recognition (한국어 음성인식을 위한 음성 데이터 수집)

  • Park, Jong-Ryeal;Kwon, Oh-Wook;Kim, Do-Yeong;Choi, In-Jeong;Jeong, Ho-Young;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.74-81
    • /
    • 1995
  • This paper describes the development of speech databases for the Korean language which were constructed at Communications Research Laboratory in KAIST. The procedure and environment to construct the speech database are presented in detail, and the phonetic and linguistic properties of the databases are presented. the databases were intended for use in designing and evaluating speech recognition algorithms. The databases consist of five different sets of speech contents : trade-related continuous speech with 3,000 words, variable-length connected digits, phoneme-balanced 75 isolated words, 500 isolated Korean provincial names, and Korean A-set words.

  • PDF

Coreference Resolution using Hierarchical Pointer Networks (계층적 포인터 네트워크를 이용한 상호참조해결)

  • Park, Cheoneum;Lee, Changki
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.9
    • /
    • pp.542-549
    • /
    • 2017
  • Sequence-to-sequence models and similar pointer networks suffer from performance degradation when an input is composed of multiple sentences or when the length of the input sentence is long. To solve this problem, this paper proposes a hierarchical pointer network model that uses both the word level and sentence level information to encode input sequences composed of several sentences at the word level and sentence level. We propose a hierarchical pointer network based coreference resolution that performs a coreference resolution for all mentions. The experimental results show that the proposed model has a precision of 87.07%, recall of 65.39% and CoNLL F1 74.61%, which is an improvement of 21.83% compared to an existing rule-based model.

VLSI Implementation of CORDIC-Based Digital Quadrature Demodulator (CORDIC을 이용한 디지탈 Quadrature 복조기의 VLSI 구현)

  • 남승현;성원용
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.7
    • /
    • pp.1718-1731
    • /
    • 1998
  • Digital quadrature demodulator is needed for the coherent demodulation in the digital communication systems such as Binary Phase-Shift-Keying, Quadrature Phase-Shift-Keying, and Quadrature Anmplitude Modulation. Conventaionally, the DDFS (Direct Digital Frequency Synthsizer) is used for generating the carrier signal and seperate multi-pliers are used for mixing. And the DDFS is implemented using the ROM (Read Only Memory), which can be a bottle-neck neck when the fast-speed and small-area implementation is required. A new architecture is developed, which employs the circular rotation mode of the CORDIC algorithm for signal mixing as well as carrier generation. To optimize the hardware design parameters, the finiteword-length effects of the proposed implementation arachitecture are analyzed in comparison with a conventional ROM-based architecture. The hardware costs are also estimated, which showed that the proposed architecture occupies only a third of the area of the conventional ROM-based architecture for the same performance. A full-custom VLSI is developed using the proposed architecture.

  • PDF

The reasons why it is good to write Chinese loanwords in Korean instead of Chinese characters and Hànyǔ pīnyīn (중국의 외래어 표기를 한자와 병음 대신 한글로 쓰면 좋은 이유 증명)

  • TaeChoong Chung
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.639-643
    • /
    • 2022
  • 한글이 우수한 글자라는 의견에 세계 대부분의 학자들이 동의하고 있다. 한글 자모의 단순성과 직교성(규칙성)으로 인해 표현할 수 있는 발음의 수도 제일 많고 읽고 쓰고 배우기도 쉽다는 점을 누구나 인정하기 때문이다. 구한말과 1950년대에 중국도 한글표기를 사용할 뻔 했다는 이야기도 있다. 그러나 그들은 한자를 사용하되 단순화 하는 방법과 병음을 사용하는 것으로 결론이 났고 현재 그렇게 사용하고 있다. 그런데 중국인들이 외래어를 다루는 것을 보면 많이 고생한다는 생각이 든다. 외래어 표기를 한글로 사용한다면 외래어를 위한 한자 단어를 만들 필요가 없고 외래어를 표현하고 배우고 읽고 쓰는데 훨씬 더 효과적으로 할 수 있고, 원음 재현율이 매우 개선된다. 또한' 글자의 길이가 짧아지고 더 멀리서도 인식되는 장점도 있다. 본 논문은 그것을 보여준다. 본 논문에서는 영어 원단어, 한국어 표기, 중국어 표기, 병음 표기, 중국어 발음 한글 표기 등을 비교해서 한글이 유리함을 보여주고자 한다. 결론적으로 외국단어를 한자나 한자의 병음으로 표현하는 것보다 한글로 표현하는 것이 중국어를 사용하는 모든 사람들에게 큰 도움이 될 것이다. 물론 그들이 한글을 읽고 쓰는 것을 배우는 부담은 있지만 몇일만 배우면 평생의 문제를 해결하게 되는 문제이므로 큰 부담은 아니라고 본다.

  • PDF

Copy-Transformer model using Copy-Mechanism and Inference Penalty for Document Abstractive Summarization (복사-메커니즘과 추론 단계의 페널티를 이용한 Copy-Transformer 기반 문서 생성 요약)

  • Jeon, Donghyeon;Kang, In-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.301-306
    • /
    • 2019
  • 문서 생성 요약은 최근 딥러닝을 이용한 end-to-end 시스템을 통해 유망한 결과들을 보여주고 있어 연구가 활발히 진행되고 있는 자연어 처리 분야 중 하나이다. 하지만 문서 생성 요약 모델을 구성하기 위해서는 대량의 본문과 요약문 쌍의 데이터 셋이 필요한데, 이를 구축하기가 쉽지 않다. 따라서 본 논문에서는 정교한 뉴스 기사 요약 데이터 셋을 기계적으로 구축하는 방법을 제안한다. 또한 딥러닝 기반의 생성 요약은 입력 문서와 다른 정보를 생성하거나, 또는 같은 단어를 반복하여 생성하는 문제점들이 존재한다. 이를 해결하기 위해 요약문을 생성할 때 입력 문서의 내용을 인용하는 복사-메커니즘과, 추론 단계에서 단어 반복을 직접적으로 제어하는 페널티를 사용하면 상대적으로 안정적인 문장이 생성될 수 있다. 그리고 Transformer 모델은 순환 신경망 모델보다 요약문 생성 과정에서 시퀀스 길이가 긴 본문의 정보를 적절히 인코딩하여 줄 수 있는 모델이다. 따라서 본 논문에서는 복사-메커니즘과 추론 단계의 페널티를 이용한 Copy-Transformer 모델을 한국어 문서 생성 요약 데이터에 적용하였다. 네이버 지식iN 질문 요약 데이터 셋과 뉴스 기사 요약 데이터 셋 상에서 실험한 결과, 제안한 모델을 이용한 생성 요약이 비교 모델들 대비 가장 좋은 성능을 보이고 양질의 요약을 생성하는 것을 확인하였다.

  • PDF

Regional differences in Korean children's development of speech production (우리나라 아동의 지역별 말소리 발달 차이)

  • Shin, Moonja;Ha, Ji-Wan;Kim, Young Tae;Kim, Soo-Jin
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.57-67
    • /
    • 2019
  • This study aimed to investigate regional differences in the development of speech production in Korean children. A total of 619 children aged 2 to 7 years from the Jeolla, Seoul/Gyeonggi, Chungcheong, and Gyeongsang areas were included in this study. The subjects were assessed with the UTAP2 word-level test. In PWC, PMLU, and PWP, the performance was significantly lower in Gyeongsang at 2 years 11 months and in Jeolla and Chungcheong at 3 years 5 months than in Seoul/Gyeonggi. The total PCC of Gyeongsang and Chungcheong and UTAP PCC of Chungcheong were significantly lower at 2 years 11 months compared with those of Seoul/Gyeonggi, while Jeolla and Chungcheong showed significantly lower total PCC and UTAP PCC than Seoul/Gyeonggi at 3 years 5 months. However, no regional difference was observed in any indicators after the age of 3 years 6 months. These results suggest that there are regional differences in the ability to produce speech sounds at a very young age, and that the differences can be explained by the differences between Seoul/Gyeonggi and the other provinces rather than by the individual characteristics of specific regions.

A Visual Study of the Quality of English Pronunciation Using the Praat Program (Praat을 활용한 영어발음특성의 시각적 연구)

  • Park, Heesuk
    • Journal of Digital Contents Society
    • /
    • v.14 no.3
    • /
    • pp.323-331
    • /
    • 2013
  • This study aims at investigating and comparing the diphthongs, words, and sentences between two Korean highschool students groups using the Praat program. To do this English words and sentences were uttered and recorded by twenty Korean subjects; each group has ten subjects. All the subjects are female and their grades range from freshman to sophomore. Acoustic features were measured from a sound spectrogram with the help of the Praat software program and analyzed through statistical analysis. Results showed that the lengths of diphthongs and words were different between two groups, but the difference was not significant. However, in the lengths of sentence utterance, the group of 5 to 6 grade students in the current grading system pronounced longer than that of 1 to 2 grade students. Especially in the pronunciation of the first two sentences with more than five words, the difference was significant. From the data of the overall sum of words between the two subject groups, we were able to find out that the differences of the lengths of the words with the diphthongs were not significant, but those of the sentences with more than five words were significant. In the pronunciation of the words between coat and code, the length of the diphthong in coat was smaller than that of in code.