• Title/Summary/Keyword: 한국어 부사

Search Result 78, Processing Time 0.023 seconds

한·중 한정 기능어 대조 연구 -한국어 '만, 밖에, 뿐'과 중국어 '지(只), 광(光), 근(僅)'을 중심으로-

  • Jeong, Bi
    • 중국학논총
    • /
    • no.62
    • /
    • pp.49-69
    • /
    • 2019
  • This study refers to the methodology of study of usage patterns by dissolving the study of Korean auxiliary particle '만, 밖에, 뿐' and Chinese range adverb '只, 光, 僅', and uses the actual language data of Korean native speakers and Chinese native speakers Using the constructed corpus, we looked at the usage patterns of auxiliary particles '만, 밖에, 뿐' and range adverb '只, 光, 僅' respectively. In the Korean and Chinese corpora, the Korean auxiliary particle '만, 밖에, 뿐' and Chinese range adverb '只, 光, 僅' are each 300 sentences, and a total of 1800 are used as analytical corpus. through the analysis of the examples, the features and differences such as the appearance ratio of Korean and Chinese, appearance environment are revealed. the analysis results of Korean and Chinese are compared to find common points and differences.

Improving Recall for Context-Sensitive Spelling Correction Rules Through Integrated Constraint Loosening Method (통합적 제약완화 방식을 통한 한국어 문맥의존 철자오류 교정규칙의 재현율 향상)

  • Choi, Hyunsoo;Yoon, Aesun;Kwon, Hyukchul
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.6
    • /
    • pp.412-417
    • /
    • 2015
  • Context-sensitive spelling errors (CSSE) are hard to correct, since they are perfect words when analyzed alone. Determined only by considering the semantic and syntactic relations of their context, CSSEs affect largely the performance of spelling and grammar checkers. The existing Korean Spelling and Grammar Checker (KSGC 4.5) adopts a rule-based method, which uses hand-made correction rules for CSSEs. Using rule-based method, the KSGC 4.5 is designed to obtain the very high precision, which results in the extremely low recall. In this paper, we integrate our previous works that control the CSSE correction rules, in order to improve the recall without sacrificing the precision. In addition to the integration, facultative insertion of adverbs and conjugation suffix of predicates are also considered, as for constraint-loosening linguistic features.

Construction of Korean Wordnet "KorLex 1.5" (한국어 어휘의미망 "KorLex 1.5"의 구축)

  • Yoon, Ae-Sun;Hwang, Soon-Hee;Lee, Eun-Ryoung;Kwon, Hyuk-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.1
    • /
    • pp.92-108
    • /
    • 2009
  • The Princeton WordNet (PWN), which was developed during last 20 years since the mid 80, aimed at representing a mental lexicon inside the human mind. Its potentiality, applicability and portability were more appreciated in the fields of NLP and KE than in cognitive psychology. The semantic and knowledge processing is indispensable in order to obtain useful information using human languages, in the CMC and HCI environment. The PWN is able to provide such NLP-based systems with 'concrete' semantic units and their network. Referenced to the PWN, about 50 wordnets of different languages were developed during last 10 years and they enable a variety of multilingual processing applications. This paper aims at describing PWN-referenced Korean Wordnet, KorLex 1.5, which was developed from 2004 to 2007, and which contains currently about 130,000 synsets and 150,000 word senses for nouns, verbs, adjectives, adverbs, and classifiers.

A Contrastive Study on Korean and Chinese Passive Expression: Centered on Korean Act Subject Marks and Chinese Passive Marks (한·중 피동 표현 대조 연구 - 한국어 행위주 표지와 중국어 피동 표지 대비 중심으로 -)

  • Yu, Tong-Tong;Kim, In-Kyun
    • Cross-Cultural Studies
    • /
    • v.47
    • /
    • pp.217-240
    • /
    • 2017
  • This paper is based on a comparative analysis of the Korean act subject marks '-에게(한테), -로, -에' and Chinese passive marks '被[$b{\grave{e}}i$]/?[$r{\grave{a}}ng$]/叫[$ji{\grave{a}}o$]/?[$g{\check{e}}i$]'. Each distribution's aspects and characteristics were examined and corresponding relationships were analyzed by comparison of these forms. The method of this comparative analysis focused on three aspects such as tangible characteristics of the two languages, selective restrictions on the 'act subject' or 'passive subject' in the passive expression, and constraints on the use of the act subject (passive) marks in the Korean passive expression by '받다'. In this comparative analysis Korean act subject markers '-에게(한테), -로, -에' and Chinese passive markers '被/?/叫/?' are always as an adverb in passive expression in combination with the act subject. Despite this common point, some differences were revealed relative to the use of the two languages. First, we reveal that the 'act subject' and the conjoined manner follow the passive expression according to characteristics of the two languages. In addition, the act subject marks of Korean passive expressions '에게/한테, -에/로' only serve as an investigative role. They are limited only to [${\pm}animate$] of the act subject. But Chinese passive markers '被/?/叫/?' are often restricted by [${\pm}animate$] of passive subject, existence and non-existence of act subject. In the Korean passive expression by '받다', it is used as act subject marks '에게/한테, -에/로' but the Chinese passive marks are restricted by the meaning of lexical items in a sentence.

A Study of Pre-trained Language Models for Korean Language Generation (한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.309-328
    • /
    • 2022
  • This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

A study about the aspect of translation on 'Hu(怖)' in novel 『Kokoro』 - Focusing on novels translated in Korean and English - (소설 『こころ』에 나타난 감정표현 '포(怖)'에 관한 번역 양상 - 한국어 번역 작품과 영어 번역 작품을 중심으로 -)

  • Yang, Jung-soon
    • Cross-Cultural Studies
    • /
    • v.53
    • /
    • pp.131-161
    • /
    • 2018
  • Emotional expressions are expressions that show the internal condition of mind or consciousness. Types of emotional expressions include vocabulary that describes emotion, the composition of sentences that expresses emotion such as an exclamatory sentence and rhetorical question, expressions of interjection, appellation, causative, passive, adverbs of attitude for an idea, and a style of writing. This study focuses on vocabulary that describes emotion and analyzes the aspect of translation when emotional expressions of 'Hu(怖)' is shown on "Kokoro". The aspect of translation was analyzed by three categories as follows; a part of speech, handling of subjects, and classification of meanings. As a result, the aspect of translation for expressions of Hu(怖)' showed that they were translated to vocabulary as they were suggested in the dictionary in some cases. However, they were not always translated as they were suggested in the dictionary. Vocabulary that described the emotion of 'Hu(怖)' in Japanese sentences were mostly translated to their corresponding parts of speech in Korean. Some adverbs needed to add 'verbs' when they were translated. Also, different vocabulary was added or used to maximize emotion. However, the correspondence of a part of speech in English was different from Korean. Examples of Japanese sentences that expressed 'Hu(怖)' by verbs were translated to expression of participles for passive verbs such as 'fear', 'dread', 'worry', and 'terrify' in many cases. Also, idioms were translated with focus on the function of sentences rather than the form of sentences. Examples, what was expressed in adverbs did not accompany verbs of 'Hu (怖)'. Instead, it was translated to the expression of participles for passive verbs and adjectives such as 'dread', 'worry', and 'terrify' in many cases. The main agents of emotion were shown in the first person and the third person in simple sentences. The translation on emotional expressions when a main agent was the first person showed that the fundamental word order of Japanese was translated as it was in Korean. However, adverbs of time and adverbs of degree tended to be added. Also, the first person as the main agent of emotion was positioned at the place of subject when it was translated in English. However, things or the cause of events were positioned at the place of subject in some cases to show the degree of 'Hu(怖)' which the main agent experienced. The expression of conjecture and supposition or a certain visual and auditory basis was added to translate the expression of emotion when the main agent of emotion was the third person. Simple sentences without a main agent of emotion showed that their subjects could be omitted even if they were essential components because they could be known through context in Korean. These omitted subjects were found and translated in English. Those subjects were not necessarily humans who were the main agents of emotion. They could be things or causes of events that specified the expression of emotion.

KorLexClas 1.5: A Lexical Semantic Network for Korean Numeral Classifiers (한국어 수분류사 어휘의미망 KorLexClas 1.5)

  • Hwang, Soon-Hee;Kwon, Hyuk-Chul;Yoon, Ae-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.1
    • /
    • pp.60-73
    • /
    • 2010
  • This paper aims to describe KorLexClas 1.5 which provides us with a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeral classifier. Differently from KorLex of other POS, of which the structure depends largely on their reference model (Princeton WordNet), KorLexClas 1.0 and its extended version 1.5 adopt a direct building method. They demand a considerable time and expert knowledge to establish the hierarchies of numeral classifiers and the relationships between lexical items. For the efficiency of construction as well as the reliability of KorLexClas 1.5, we use following processes: (1) to use various language resources while their cross-checking for the selection of classifier candidates; (2) to extend the list of numeral classifiers by using a shallow parsing techniques; (3) to set up the hierarchies of the numeral classifiers based on the previous linguistic studies; and (4) to determine LUB(Least Upper Bound) of the numeral classifiers in KorLexNoun 1.5. The last process provides the open list of the co-occurring nouns for KorLexClas 1.5 with the extensibility. KorLexClas 1.5 is expected to be used in a variety of NLP applications, including MT.

Korean Syntactic Analysis by Using Clausal Segmentation of Embedded Clause (내포문의 단문 분할을 이용한 한국어 구문 분석)

  • Lee, Hyeon-Yeong;Lee, Yong-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.1
    • /
    • pp.50-58
    • /
    • 2008
  • Most of Korean sentences are complex sentences which consisted of main clause and embedded clause. These complex sentences have more than one predicate and this causes various syntactic ambiguities in syntactic analysis. These ambiguities are caused by phrase attachment problems which are occurred by the modifying scope of embedded clause. To resolve it, we decide the scope of embedded clause in the sentence and consider this clause as a unit of syntactic category. In this paper, we use sentence patterns information(SPI) and syntactic properties of Korean to decide a scope of embedded clause. First, we split the complex sentence into embedded clause and main clause by the method that embedded clause must have maximal arguments. This work is done by the SPI of the predicate in the embedded clause. And then, the role of this embedded clause is converted into a noun phrases or adverbial phrases in the main clause by the properties of Korean syntax. By this method, the structure of complex sentence is exchanged into a clause. And some phrases attachment problem, which is mainly caused by the modifying scope, is resolved easily. In this paper, we call this method clausal segmentation for embedded clause. By empirical results of parsing 1000 sentences, we found that our method decreases 88.32% of syntactic ambiguities compared to the method that doesn't use SPI and split the sentence with basic clauses.

A Measurement of Lexical Relationship for Concept Network Based on Semantic Features (의미속성 기반의 개념망을 위한 어휘 연관도 측정)

  • Ock, Eun-Joo;Lee, Wang-Woo;Lee, Soo-Dong;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.146-154
    • /
    • 2001
  • 본 논문에서는 개념망 구축을 위해 사전 뜻풀이말에서 추출 가능한 의미속성의 분포 정보를 기반으로 어휘 연관도를 측정하고자 한다. 먼저 112,000여 개의 사전 뜻풀이말을 대상으로 품사 태그와 의미 태그가 부여된 코퍼스에서 의미속성을 추출한다. 추출 가능한 의미속성은 체언류, 부사류, 용언류 등이 있는데 본 논문에서는 일차적으로 명사류와 수식 관계에 있는 용언류 중 관형형 전성어미('ㄴ/은/는')가 부착된 것을 대상으로 한다. 추출된 공기쌍 45,000여 개를 대상으로 정제 작업을 거쳐 정보이론의 상호 정보량(MI)을 이용하여 명사류와 용언류의 연관도를 측정한다. 한편, 자료의 희귀성을 완화하기 위해 수식 관계의 명사류와 용언류는 기초어휘를 중심으로 유사어 집합으로 묶어서 작업을 하였다. 이러한 의미속성의 분포 정보를 통해 측정된 어휘 연관도는 의미속성의 공유 정도를 계산하여 개념들간에 계층구조를 구축하는 데 이용할 수 있다.

  • PDF