• Title/Summary/Keyword: grammatical morpheme

Search Result 18, Processing Time 0.019 seconds

The Analysis of Endings Which Begin with 'a/a in Korean Morphological Analyzer (한국어 형태소 분석기에서 '아/어'로 시작되는 어미의 분석)

  • 강승식;김영택
    • Korean Journal of Cognitive Science
    • /
    • v.3 no.1
    • /
    • pp.25-39
    • /
    • 1991
  • When an ending which begins with 'a/a'combines to a stem,'a/a'can be deleted.Especially when ot combines to an h-irregular verb,it is represented as a variant like '-a-','-e-','-ia-',or'-ie-'.In order to analyze the variants of 'a/a',we suggest the format of a grammatical morpheme dictionary which is represented as a binary tree and several procedures which process the variants so that the unexpected errors can be removed which occur frequently when we analyze Korean worl phrase.

Sentiment Analysis using Robust Parallel Tri-LSTM Sentence Embedding in Out-of-Vocabulary Word (Out-of-Vocabulary 단어에 강건한 병렬 Tri-LSTM 문장 임베딩을 이용한 감정분석)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.16-24
    • /
    • 2021
  • The exiting word embedding methodology such as word2vec represents words, which only occur in the raw training corpus, as a fixed-length vector into a continuous vector space, so when mapping the words incorporated in the raw training corpus into a fixed-length vector in morphologically rich language, out-of-vocabulary (OOV) problem often happens. Even for sentence embedding, when representing the meaning of a sentence as a fixed-length vector by synthesizing word vectors constituting a sentence, OOV words make it challenging to meaningfully represent a sentence into a fixed-length vector. In particular, since the agglutinative language, the Korean has a morphological characteristic to integrate lexical morpheme and grammatical morpheme, handling OOV words is an important factor in improving performance. In this paper, we propose parallel Tri-LSTM sentence embedding that is robust to the OOV problem by extending utilizing the morphological information of words into sentence-level. As a result of the sentiment analysis task with corpus in Korean, we empirically found that the character unit is better than the morpheme unit as an embedding unit for Korean sentence embedding. We achieved 86.17% accuracy on the sentiment analysis task with the parallel bidirectional Tri-LSTM sentence encoder.

Korean Part-Of-Speech Tagging by using Head-Tail Tokenization (Head-Tail 토큰화 기법을 이용한 한국어 품사 태깅)

  • Suh, Hyun-Jae;Kim, Jung-Min;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.17-25
    • /
    • 2022
  • Korean part-of-speech taggers decompose a compound morpheme into unit morphemes and attach part-of-speech tags. So, here is a disadvantage that part-of-speech for morphemes are over-classified in detail and complex word types are generated depending on the purpose of the taggers. When using the part-of-speech tagger for keyword extraction in deep learning based language processing, it is not required to decompose compound particles and verb-endings. In this study, the part-of-speech tagging problem is simplified by using a Head-Tail tokenization technique that divides only two types of tokens, a lexical morpheme part and a grammatical morpheme part that the problem of excessively decomposed morpheme was solved. Part-of-speech tagging was attempted with a statistical technique and a deep learning model on the Head-Tail tokenized corpus, and the accuracy of each model was evaluated. Part-of-speech tagging was implemented by TnT tagger, a statistical-based part-of-speech tagger, and Bi-LSTM tagger, a deep learning-based part-of-speech tagger. TnT tagger and Bi-LSTM tagger were trained on the Head-Tail tokenized corpus to measure the part-of-speech tagging accuracy. As a result, it showed that the Bi-LSTM tagger performs part-of-speech tagging with a high accuracy of 99.52% compared to 97.00% for the TnT tagger.

Authorship Attribution of Web Texts with Korean Language Applying Deep Learning Method (딥러닝을 활용한 웹 텍스트 저자의 남녀 구분 및 연령 판별 : SNS 사용자를 중심으로)

  • Park, Chan Yub;Jang, In Ho;Lee, Zoon Ky
    • Journal of Information Technology Services
    • /
    • v.15 no.3
    • /
    • pp.147-155
    • /
    • 2016
  • According to rapid development of technology, web text is growing explosively and attracting many fields as substitution for survey. The user of Facebook is reaching up to 113 million people per month, Twitter is used in various institution or company as a behavioral analysis tool. However, many research has focused on meaning of the text itself. And there is a lack of study for text's creation subject. Therefore, this research consists of sex/age text classification with by using 20,187 Facebook users' posts that reveal the sex and age of the writer. This research utilized Convolution Neural Networks, a type of deep learning algorithms which came into the spotlight as a recent image classifier in web text analyzing. The following result assured with 92% of accuracy for possibility as a text classifier. Also, this research was minimizing the Korean morpheme analysis and it was conducted using a Korean web text to Authorship Attribution. Based on these feature, this study can develop users' multiple capacity such as web text management information resource for worker, non-grammatical analyzing system for researchers. Thus, this study proposes a new method for web text analysis.

Sentence interpretation strategies by typically developing and late-talking Korean toddlers (말 늦은 아동의 문장 이해 전략)

  • Jo, Sujung;Hwang, Mina;Choi, Kyung-Soon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.13-21
    • /
    • 2014
  • Late talkers are young children who are delayed in their expressive language skills despite normal nonverbal cognitive ability, adequate hearing and typical personality development. The purpose of this study is to investigate the sentence interpretation strategies used by Korean-speaking late talkers and age-matched normal children. Nine late talkers and nine normal children matched by age at 30-35months were participated in this study. 27 simple noun-noun-verb(NNV) sentences were generated by factorial combination of case-marker [nominal case-marker on the first noun and accusative on the second (C1), accusative on the first noun and nominative on the second (C2), and no case markers on both nouns (C0)], and animacy of the nouns [animate-inanimate(AI), inanimate-animate(IA), animate-animate(AA)]. All the children were asked to "act out" their interpretation of the given sentence. For each type of sentences the percentage of choices of the first noun as the agent was calculated. The results of group (2) ${\times}$ animacy(3) ${\times}$ case-marker(3) mixed ANOVA showed a significant main effect for 'animacy', 'case marker' and 'group(2) ${\times}$ case-marker (3)'. The late talkers relied on semantic (animacy) cues in their interpretation of the sentences, while the normal peers utilized both animacy and grammatical morpheme (case-marker) cues. The results indicated that the late-talkers' comprehension skills were also delayed.

The Comprehension and Production of Tense Markings in Language Delayed Children and Typically Developing Children (언어발달지체아동과 일반아동의 시제 표지 이해 및 산출 특성)

  • Jo, Miok;Choi, Soyoung;Hwang, Mina
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.123-131
    • /
    • 2014
  • The purpose of this study is to investigate the comprehension and production of various tense markings in Korean-speaking children with and without language delay. Thirty children with language delay(LD) and 30 typically developing(TD) children participated in the study. In each group, half were at the age of 4-years and the other half at 7-years. In both the comprehension and production task, 28 verbs containing four types of tense markings were used: past tense '-et ta', two present progressives '-ko itta', '-enta', and future tense '-elyeko hanta'. In the comprehension task, the children were presented with three printed still-scenes of video recording of a verb action, each representing future, present progressive, and past tense of the verb, respectively. Then they listened to the action verb with one of the 4 tense markings and had to pick the scene that matched the verb tense. In the production task, the children were given one of the three scenes and asked to produce the verb with appropriate tense marking. In both tasks, the LD children performed significantly worse than the TD children, and the older children performed significantly better than the younger children. Interestingly, the pattern of performances across different types of tense markings at the two language-age levels were closely similar in LD children and TD children. This similarity of groups seemed stronger in the comprehension task than the production task.

Suggestions for the Development of RegTech Based Ontology and Deep Learning Technology to Interpret Capital Market Regulations (레그테크 기반의 자본시장 규제 해석 온톨로지 및 딥러닝 기술 개발을 위한 제언)

  • Choi, Seung Uk;Kwon, Oh Byung
    • The Journal of Information Systems
    • /
    • v.30 no.1
    • /
    • pp.65-84
    • /
    • 2021
  • Purpose Based on the development of artificial intelligence and big data technologies, the RegTech has been emerged to reduce regulatory costs and to enable efficient supervision by regulatory bodies. The word RegTech is a combination of regulation and technology, which means using the technological methods to facilitate the implementation of regulations and to make efficient surveillance and supervision of regulations. The purpose of this study is to describe the recent adoption of RegTech and to provide basic examples of applying RegTech to capital market regulations. Design/methodology/approach English-based ontology and deep learning technologies are quite developed in practice, and it will not be difficult to expand it to European or Latin American languages that are grammatically similar to English. However, it is not easy to use it in most Asian languages such as Korean, which have different grammatical rules. In addition, in the early stages of adoption, companies, financial institutions and regulators will not be familiar with this machine-based reporting system. There is a need to establish an ecosystem which facilitates the adoption of RegTech by consulting and supporting the stakeholders. In this paper, we provide a simple example that shows a procedure of applying RegTech to recognize and interpret Korean language-based capital market regulations. Specifically, we present the process of converting sentences in regulations into a meta-language through the morpheme analyses. We next conduct deep learning analyses to determine whether a regulatory sentence exists in each regulatory paragraph. Findings This study illustrates the applicability of RegTech-based ontology and deep learning technologies in Korean-based capital market regulations.

On the base inflectional forms of Korean old vernacular letters (언간에 나타나는 어기활용형에 대한 고찰)

  • Lee, Hyun-Ju
    • (The)Study of the Eastern Classic
    • /
    • no.56
    • /
    • pp.297-329
    • /
    • 2014
  • This paper aims to examine the base inflectional forms of Korean old vernacular letters, and explain why it appears with frequency. In the korean old vernacular letters, the suffix 'ha-' and ending of the 'Base+ha-' adjective derivation are not appear with extraordinary frequency. I called it the base inflectional forms. I consider it in function and morphological constructions and also the syntactic constructions. Whenever Joseon-era people wrote a letter with a time limit, they have need to diminish their exertion to use of the brush. Therefore the base inflectional forms appear with extraordinary frequency in comparison with other papers. In the 'X ha-' word formation of Korean old vernacular letters, 'ha-' is formal morpheme without substantial meaning. So 'X' is left and 'ha-' and ending can be omitted resolutely. The base inflectional forms are occurred to voluntary language performance for a particular intention. but it is not appear in all conditions. In some circumstances, it appear. I checked out the constructions on base inflectional forms. In the 'X ha-' word formation, 'X' is predicative base without fail. and the ending which take part in base inflectional forms has a grammatical function unadulteratedly.