• Title/Summary/Keyword: Morpheme

Search Result 238, Processing Time 0.023 seconds

Lattice-based Discriminative Approach for Korean Morphological Analysis (래티스상의 구조적 분류에 기반한 한국어 형태소 분석 및 품사 태깅)

  • Na, Seung-Hoon;Kim, Chang-Hyun;Kim, Young-Kil
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.7
    • /
    • pp.523-532
    • /
    • 2014
  • In this paper, we propose a lattice-based discriminative approach for Korean morphological analysis and POS tagging. In our approach, for an input sentence, a morpheme lattice is first created from a lexicon where each node corresponds to a morpheme in the lexicon and each edge is formed between two consecutive morphemes. A candidate result of morphological analysis is then represented as a path in the morpheme lattice which is defined as the sequence of edges, starting in the initial state and ending with the final state. In this setting, the morphological analysis is simply considered as the process of finding the best path among all possible paths. Experiment results show that the proposed lattice-based method outperforms the first-order linear-chain CRF.

Korean Homograph Tagging Model based on Sub-Word Conditional Probability (부분어절 조건부확률 기반 동형이의어 태깅 모델)

  • Shin, Joon Choul;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.10
    • /
    • pp.407-420
    • /
    • 2014
  • In general, the Korean morpheme analysis procedure is divided into two steps. In the first step as an ambiguity generation step, an Eojeol is analyzed into many morpheme sequences as candidates. In the second step, one appropriate candidate is chosen by using contextual information. Hidden Markov Model(HMM) is typically applied in the second step. This paper proposes Sub-word Conditional Probability(SCP) model as an alternate algorithm. SCP uses sub-word information of adjacent eojeol first. If it failed, then SCP use morpheme information restrictively. In the accuracy and speed comparative test, HMM's accuracy is 96.49% and SCP's accuracy is just 0.07% lower. But SCP reduced processing time 53%.

Early Locus of a Linguistic Variable in Word Recognition (단어재인 초기단계에서의 언어학적 변인의 역할)

  • Lee, Chang H.
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2002.05a
    • /
    • pp.105-110
    • /
    • 2002
  • The syllable and the morpheme are known to be important linguistic variables. This study examined whether these variables were activated in an early stage of word recognition using the fast priming task. Mixing the lettercase for the prime, the results of experiment 1 and 2 revealed effects of the syllable and the morpheme at a short SOA (Stimulus Onset Asynchrony), but not at a long SOA. Using the same manipulation in the experiment 3 and 4, an effect of syllable was found to be significant at the short SOA, but not at the long SOA. The study showed that the syllable plays a role in an early stage of word recognition.

  • PDF

Korean deadjectival inchoatives and measure phrases: a compositional study

  • Lim, Dongsik
    • Language and Information
    • /
    • v.20 no.1
    • /
    • pp.73-91
    • /
    • 2016
  • Korean adjectives in general cannot combine with measure phrases (MP), but MPs are compatible with adjectives when they appear with the inchoative morpheme -(e)ci. In this case, MPs can only denote the difference between two states along the dimension denoted by the root adjective. To account for this, this paper proposes that i) -(e)ci is a spell-out of V in the directed motion construction which takes an abstract path argument, like become, and ii) this path argument contains a comparative morpheme. By assuming this we can explain why MPs appear with -(e)ci, as well as other interesting phenomena such as variable telicity in deadjectival verbs with -(e)ci.

  • PDF

The Acquisition of Korean Grammatical Morphemes in Early Childhood (한국아동이 초기에 획득한 문법적 형태소의 종류 및 획득 시기)

  • Yi, Soon-Hyung
    • Korean Journal of Child Studies
    • /
    • v.21 no.4
    • /
    • pp.51-68
    • /
    • 2000
  • To reveal when toddlers and children acquire the grammatical morpheme of Korean language, this study investigated the way they respond to some picture tasks. The object of this research was 174 children ranging from 18 months to 60 months, who were selected from two child-care centers located in Seoul and Gyeongi Province. Following the statistical analysis of the data, this study ascertained that 2-, and 3- year-old children acquire the most part of grammatical morphemes such as nouns, pronouns, verbs, adverbs, adjectives, and interrogation terms. The fact that the process of acquisition was significantly different among six age groups has proved the hypothesis of the gradual acquisition of grammatical morpheme.

  • PDF

Korean broadcast news transcription system with out-of-vocabulary(OOV) update module (한국어 방송 뉴스 인식 시스템을 위한 OOV update module)

  • Jung Eui-Jung;Yun Seung
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.33-36
    • /
    • 2002
  • We implemented a robust Korean broadcast news transcription system for out-of-vocabulary (OOV), tested its performance. The occurrence of OOV words in the input speech is inevitable in large vocabulary continuous speech recognition (LVCSR). The known vocabulary will never be complete due to the existence of for instance neologisms, proper names, and compounds in some languages. The fixed vocabulary and language model of LVCSR system directly face with these OOV words. Therefore our Broadcast news recognition system has an offline OOV update module of language model and vocabulary to solve OOV problem and selects morpheme-based recognition unit (so called, pseudo-morpheme) for OOV robustness.

  • PDF

Sentiment Analysis on Movie Reviews Using Word Embedding and CNN (워드 임베딩과 CNN을 사용하여 영화 리뷰에 대한 감성 분석)

  • Ju, Myeonggil;Youn, Seongwook
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.1
    • /
    • pp.87-97
    • /
    • 2019
  • Reaction of people is importantly considered about specific case as a social network service grows. In the previous research on analysis of social network service, they predicted tendency of interesting topic by giving scores to sentences written by user. Based on previous study we proceeded research of sentiment analysis for social network service's sentences, which predict the result as positive or negative for movie reviews. In this study, we used movie review to get high accuracy. We classify the movie review into positive or negative based on the score for learning. Also, we performed embedding and morpheme analysis on movie review. We could predict learning result as positive or negative with a number 0 and 1 by applying the model based on learning result to social network service. Experimental result show accuracy of about 80% in predicting sentence as positive or negative.

Study on Chinese Character Borrowing in Korean Language (우리말 중 한자차용 실태 고찰 - 중국어의 한자차용 사례와의 비교를 중심으로)

  • PARK, SEOK HONG
    • Cross-Cultural Studies
    • /
    • v.33
    • /
    • pp.359-384
    • /
    • 2013
  • There is linguistic phenomenon that Korean syllable, morpheme and word are substituted with Chinese Character. These phenomenon is called Chinese Character Borrowing, the Chinese character used here is called Borrowed Chinese Character. Whereas borrowing Chinese character in Chinese is used for borrowing only sound for different word, borrowing Chinese character in Korean is used for assigning new meaning. Hence, by borrowing Chinese character in Korean, a syllable which had no meaning originally get new meaning, morpheme and word meaning has changed. At advertisement and campaign, Chinese Character Borrowing has lots of linguistical advantage such as visual immediacy, effectiveness of meaning expression. However, there are number of cases found that violate grammar rule and word constitution practice by Chinese Character Borrowing. For this reason, Chinese Character Borrowing has the problem polluting Korean along with another foreign words. Thus, this paper focus on study Chinese Character Borrowing phenomenon in Korean, and analysis its effectiveness and impact in Korean. In addition, analysis the problem of Borrowed chinese Character, and suggestion several alternative for right use of Korean is followed.

KNE: An Automatic Dictionary Expansion Method Using Use-cases for Morphological Analysis

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.3
    • /
    • pp.191-197
    • /
    • 2019
  • Morphological analysis is used for searching sentences and understanding context. As most morpheme analysis methods are based on predefined dictionaries, the problem of a target word not being registered in the given morpheme dictionary, the so-called unregistered word problem, can be a major cause of reduced performance. The current practical solution of such unregistered word problem is to add them by hand-write into the given dictionary. This method is a limitation that restricts the scalability and expandability of dictionaries. In order to overcome this limitation, we propose a novel method to automatically expand a dictionary by means of use-case analysis, which checks the validity of the unregistered word by exploring the use-cases through web crawling. The results show that the proposed method is a feasible one in terms of the accuracy of the validation process, the expandability of the dictionary and, after registration, the fast extraction time of morphemes.

Proper Noun Embedding Model for the Korean Dependency Parsing

  • Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.93-102
    • /
    • 2022
  • Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.