• Title/Summary/Keyword: Korean Eojeol

Search Result 55, Processing Time 0.022 seconds

The Influence of Lexical Factors on Verbal Eojeol Recognition: Evidence from L1 Korean Speakers and L2 Korean Learners (한국어 용언 어절 재인에 미치는 어휘 변인의 영향 -모어 화자와 고급 학습자의 예-)

  • Kim, Youngjoo;Lee, Sunjin;Lee, Eun-Ha;Nam, Kichun;Jun, Hyunae;Lee, Sun-Young
    • Journal of Korean language education
    • /
    • v.29 no.3
    • /
    • pp.25-53
    • /
    • 2018
  • This study examined the influence of lexical factors on verbal Eojeol recognition. To meet the goal, forty-five L2 Korean learners and twenty-two Korean native speakers took Eojeol decision tasks measured with the lexical factors such as 'number of strokes', 'number of consonants and vowels', 'number of syllables', 'number of morphemes', 'whole Eojeol frequency', 'root frequency', 'first-syllable-sharing frequency', and 'number of dictionary meanings.' As a result, 'whole Eojeol frequency' was the most effective factor to predict Eojeol recognition reaction time for native speakers and L2 learners, which supports the full-list model. Other lexical factors influencing Eojeol recognition reaction time in L2 learners were different following their proficiency level.

A Study On the Relation between Eojeol and Prosodic Phrase (어절 구성과 운율구 형성과의 관계에 대한 연구 - 관형사형 전성어미를 중심으로 -)

  • Park, Mi-Kyoung
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.165-170
    • /
    • 2004
  • The aim of this paper is to study the relation between Eojeol and prosodic phrase in Korean. Depending on two adnominal ending form in Korean '-ㄴ' and '-ㄹ', there are some different prosodic phrase: 1) $1{\sim}2$ syllable eojeols : '-ㄴ' has none prosodic phrase in front of the eojeol, an accentual phrase in the end of the eojeol. In contrast, '-ㄹ' has an accentual phrase in front of the eojeol, but none in the end of the eojeol. 2) More than 3 syllable eojeols : '-ㄴ' have accentual phrases on the edge of the eojeol. but '-ㄹ' has an accentual phrase in the end of the eojeol.

  • PDF

A postprocessing method for korean optical character recognition using eojeol information (어절 정보를 이용한 한국어 문자 인식 후처리 기법)

  • 이영화;김규성;김영훈;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.2
    • /
    • pp.65-70
    • /
    • 1998
  • In this paper, we will to check and to correct mis-recognized word using Eojeol information. First, we divided into 16 classes that constituents in a Eojeol after we analyzed Korean statement into Eojeol units. Eojeol-Constituent state diagram constructed these constitutents, find the Left-Right Connectivity Information. As analogized the speech of connectivity information, reduced the number of cadidate words and restricted case of morphological analysis for mis-recognition Eojeol. Then, we improved correction speed uisng heuristic information as the adjacency information for Eojeol each other. In the correction phase, construct Reverse-Order Word Dictionary. Using this, we can trace word dictionary regardless of mis-recongnition word position. Its results show that improvement of recognition rate from 97.03% to 98.02% and check rate, reduction of chadidata words and morpholgical analysis cases.

  • PDF

Eojeol-Block Bidirectional Algorithm for Automatic Word Spacing of Hangul Sentences (한글 문장의 자동 띄어쓰기를 위한 어절 블록 양방향 알고리즘)

  • Kang, Seung-Shik
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.441-447
    • /
    • 2000
  • Automatic word spacing is needed to solve the automatic indexing problem of the non-spaced documents and the space-insertion problem of the character recognition system at the end of a line. We propose a word spacing algorithm that automatically finds out word spacing positions. It is based on the recognition of Eojeol components by using the sentence partition and bidirectional longest-match algorithm. The sentence partition utilizes an extraction of Eojeol-block where the Eojeol boundary is relatively clear, and a Korean morphological analyzer is applied bidirectionally to the recognition of Eojeol components. We tested the algorithm on two sentence groups of about 4,500 Eojeols. The space-level recall ratio was 97.3% and the Eojeol-level recall ratio was 93.2%.

  • PDF

The characteristics of eye-movement during children read Korean texts (어린이 글 읽기에서 나타나는 안구 운동의 특징)

  • Koh, Sung-Ryong;Yoon, So-Jeong;Min, Chul-Hong;Choi, Kyung-Soon;Ko, Sun-Hee;Hwang, Min-A
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.481-503
    • /
    • 2010
  • In the present study, we examined global and local characteristics of eye movements while 17 Korean third-graders read a Korean story and an expository text. In story reading, children fixated for about 213ms at an eojeol(word cluster), made a forward saccade of about 3.6 characters to the next eojeol, and regressed backward at 30.8% on average. In expository text reading, children fixated for about 214ms at an eojeol, made a forward saccade of about 3.3 characters to the next eojeol, and regressed backward at 31% on average. In addition, the effects of eojeol length, word frequency and landing position were examined. The gaze duration in the long ejoels was longer than in the short eojeols. In a further analysis where the repeatedly used eojeols were excluded, the eojeol length effect appeared in the low-frequency words, but seemed to disappear in the high-frequency words. In terms of landing position, the eyes seemed to land near the center of an eojeol more frequently than on the boundaries. When the eyes landed at the boundary of an eojeol, the eyes tended to fixate the eojeol again.

  • PDF

Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer (말뭉치와 형태소 분석기를 활용한 한국어 자동 띄어쓰기)

  • Shim, Kwangseob
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.68-75
    • /
    • 2015
  • This paper proposes a method for the automatic word spacing of unsegmented Korean sentences. In our method, eojeol monograms are used for word spacing as opposed to the syllable n-grams that have been used in previous studies. The use of a Korean morphological analyzer is limited to the correction of typical word spacing errors. Our method gives a 98.06% syllable accuracy and a 94.15% eojeol recall, when 10-fold cross-validated with the Sejong corpus, after filtering out non-hangul eojeols. The processing rate is 250K eojeols or 1.8 MB per second on a typical personal computer. Syllable accuracy and eojeol recall are related to the size of the eojeol dictionary, better performance is expected with a bigger corpus.

Cloning of Korean Morphological Analyzers using Pre-analyzed Eojeol Dictionary and Syllable-based Probabilistic Model (기분석 어절 사전과 음절 단위의 확률 모델을 이용한 한국어 형태소 분석기 복제)

  • Shim, Kwangseob
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.3
    • /
    • pp.119-126
    • /
    • 2016
  • In this study, we verified the feasibility of a Korean morphological analyzer that uses a pre-analyzed Eojeol dictionary and syllable-based probabilistic model. For the verification, MACH and KLT2000, Korean morphological analyzers, were cloned with a pre-analyzed eojeol dictionary and syllable-based probabilistic model. The analysis results were compared between the cloned morphological analyzer, MACH, and KLT2000. The 10 million Eojeol Sejong corpus was segmented into 10 sets for cross-validation. The 10-fold cross-validated precision and recall for cloned MACH and KLT2000 were 97.16%, 98.31% and 96.80%, 99.03%, respectively. Analysis speed of a cloned MACH was 308,000 Eojeols per second, and the speed of a cloned KLT2000 was 436,000 Eojeols per second. The experimental results indicated that a Korean morphological analyzer that uses a pre-analyzed eojeol dictionary and syllable-based probabilistic model could be used in practical applications.

An HMM-based Korean TTS synthesis system using phrase information (운율 경계 정보를 이용한 HMM 기반의 한국어 음성합성 시스템)

  • Joo, Young-Seon;Jung, Chi-Sang;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.89-91
    • /
    • 2011
  • In this paper, phrase boundaries in sentence are predicted and a phrase break information is applied to an HMM-based Korean Text-to-Speech synthesis system. Synthesis with phrase break information increases a naturalness of the synthetic speech and an understanding of sentences. To predict these phrase boundaries, context-dependent information like forward/backward POS(Part-of-Speech) of eojeol, a position of eojeol in a sentence, length of eojeol, and presence or absence of punctuation marks are used. The experimental results show that the naturalness of synthetic speech with phrase break information increases.

  • PDF

Statistical Survey of Vocabulary in Korean Textbook for Elementary School 6th-Grade (초등학교 6학년 국어교과서의 어휘 통계조사)

  • Kim, Jong-Young;Kim, Cheol-Su
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.5
    • /
    • pp.515-524
    • /
    • 2012
  • This paper studied the statistics such as the total number of syllables, the kinds of syllables, the frequency of syllables, the number of eojeols(word phrases unique in Korean language), the kinds of eojeols, average length of eojeols, the frequency of eojeols and the parts of speech in four different Korean textbooks for 6th-grade students(6-1 Korean Reading, 6-1 Korean Speaking Listening Writing, 6-2 Korean Reading and 6-2 Korean Speaking Listening Writing). The results of the statistical survey are as follows: the number of Hangul syllables was 194,683; the kinds of syllables were 1,290; the average frequency of syllables was 150.9; the number of eojeol was 70,185; the kinds of eojeol were 22,647; the average frequency of eojeol was 3.1; the average length of eojeols was 2.8 syllables, the longest one consist of 10 syllables. In parts of speech, nouns are used more in the Korean Reading textbook, and verbs are used more in Korean Speaking Listening Writing.

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.