• Title/Summary/Keyword: Semantic Ambiguity

Search Result 62, Processing Time 0.024 seconds

A Rule-Based Analysis from Raw Korean Text to Morphologically Annotated Corpora

  • Lee, Ki-Yong;Markus Schulze
    • Language and Information
    • /
    • v.6 no.2
    • /
    • pp.105-128
    • /
    • 2002
  • Morphologically annotated corpora are the basis for many tasks of computational linguistics. Most current approaches use statistically driven methods of morphological analysis, that provide just POS-tags. While this is sufficient for some applications, a rule-based full morphological analysis also yielding lemmatization and segmentation is needed for many others. This work thus aims at 〔1〕 introducing a rule-based Korean morphological analyzer called Kormoran based on the principle of linearity that prohibits any combination of left-to-right or right-to-left analysis or backtracking and then at 〔2〕 showing how it on be used as a POS-tagger by adopting an ordinary technique of preprocessing and also by filtering out irrelevant morpho-syntactic information in analyzed feature structures. It is shown that, besides providing a basis for subsequent syntactic or semantic processing, full morphological analyzers like Kormoran have the greater power of resolving ambiguities than simple POS-taggers. The focus of our present analysis is on Korean text.

  • PDF

Fake News Detection Using Deep Learning

  • Lee, Dong-Ho;Kim, Yu-Ri;Kim, Hyeong-Jun;Park, Seung-Myun;Yang, Yu-Jun
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1119-1130
    • /
    • 2019
  • With the wide spread of Social Network Services (SNS), fake news-which is a way of disguising false information as legitimate media-has become a big social issue. This paper proposes a deep learning architecture for detecting fake news that is written in Korean. Previous works proposed appropriate fake news detection models for English, but Korean has two issues that cannot apply existing models: Korean can be expressed in shorter sentences than English even with the same meaning; therefore, it is difficult to operate a deep neural network because of the feature scarcity for deep learning. Difficulty in semantic analysis due to morpheme ambiguity. We worked to resolve these issues by implementing a system using various convolutional neural network-based deep learning architectures and "Fasttext" which is a word-embedding model learned by syllable unit. After training and testing its implementation, we could achieve meaningful accuracy for classification of the body and context discrepancies, but the accuracy was low for classification of the headline and body discrepancies.

A study on semantic ambiguity in the Korean Named Entity Recognition (한국어 개체명 인식 과제에서의 의미 모호성 연구)

  • Kim, Seonghyun;Song, Youngsook;Song, Chisung;Han, Jiyoon
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.203-208
    • /
    • 2021
  • 본 논문에서는 맥락에 따라 개체명의 범주가 달라지는 어휘를 중심으로 교차 태깅된 개체명의 성능을 레이블과 스팬 정답률, 문장 성분과 문장 위치에 따른 정답률로 나누어 살펴 보았다. 레이블의 정확도는 KoGPT2, mBERT, KLUE-RoBERTa 순으로 정답률이 높아지는 양상을 보였다. 스팬 정답률에서는 mBERT가 KLUE-RoBERTa보다 근소하게 성능이 높았고 KoGPT2는 매우 낮은 정확도를 보였다. 다만, KoGPT2는 개체명이 문장의 끝에 위치할 때는 다른 모델과 비슷한 정도로 성능이 개선되는 결과를 보였다. 문장 종결 위치에서 인식기의 성능이 좋은 것은 실험에 사용된 말뭉치의 문장 성분이 서술어일 때 명사의 중첩이 적고 구문이 패턴화되어 있다는 특징과 KoGPT2가 decoder기반의 모델이기 때문으로 여겨지나 이에 대해서는 후속 연구가 필요하다.

  • PDF

KOREAN TOPIC MODELING USING MATRIX DECOMPOSITION

  • June-Ho Lee;Hyun-Min Kim
    • East Asian mathematical journal
    • /
    • v.40 no.3
    • /
    • pp.307-318
    • /
    • 2024
  • This paper explores the application of matrix factorization, specifically CUR decomposition, in the clustering of Korean language documents by topic. It addresses the unique challenges of Natural Language Processing (NLP) in dealing with the Korean language's distinctive features, such as agglutinative words and morphological ambiguity. The study compares the effectiveness of Latent Semantic Analysis (LSA) using CUR decomposition with the classical Singular Value Decomposition (SVD) method in the context of Korean text. Experiments are conducted using Korean Wikipedia documents and newspaper data, providing insight into the accuracy and efficiency of these techniques. The findings demonstrate the potential of CUR decomposition to improve the accuracy of document clustering in Korean, offering a valuable approach to text mining and information retrieval in agglutinative languages.

Update Semantic Preserving Object-Oriented View (갱신 의미 보존 객체-지향 뷰)

  • 나영국
    • The KIPS Transactions:PartD
    • /
    • v.8D no.1
    • /
    • pp.32-43
    • /
    • 2001
  • Due to the limitation of data modeling power and the view update ambiguity, relational view is limitedly used for engineering applications. On the contrary, object-oriented database view would playa vital role in defining custom interface for engineering applications because the above two limitations of the relational view are overcome by the object-oriented view. Above all, engineering application data interface should fully support updates. More specifically, updates against the data interface needs to be unambiguously defined and its semantic behavior should be equal to base schema updates'. For this purpose, we define the notion of update semantic preserving which means that view updates displays the same semantics as base schema. Besides, in order to show the feasibility of this characteristics, specific and concrete algorithms for update preserving updates are presented for a CAD specialized object-oriented database view - MultiView. This paper finds that in order that virtual classes coudld form a schema with 'isa' relationships rather than just a group of classes, the update semantics on the virtual classes should be defined such that the implied meaning of 'isa' relationships between classes are not to be violated. Besides, as its sufficiency conditions, we derived the update semantics and schema constituable conditions of the virtual classes that make view schemas look like base schemas. To my best knowledge, this is the first research that presents the sufficiency conditions by which we could defined object-oriented views as integrated schemas rather than as separate classes.

  • PDF

Annotation Modeling and System Implementation for Hand-held Environment (휴대용 단말기 환경을 위한 Annotation 모델링 및 시스템 구현)

  • Sohn, Won-Sung
    • Journal of The Korean Association of Information Education
    • /
    • v.10 no.2
    • /
    • pp.219-226
    • /
    • 2006
  • For the accurate creation of annotation information in a free-form annotation environment, the ambiguity that arises in the analysis stage between the geometric information and annotations needs to be resolved. Therefore, this This paper identifies, analyzes, and proposes presents solutions methods for the ambiguity that can occur between free-form marking and various contexts in XML-based annotation environment. The proposed method is based on context which includes various textual and structure information between free-form marking and annotated part. The proposed method show that the annotated portions areas included in the free-form marking information are more accurate, achieving more accurate exchange results amongst multiple users in a heterogeneous document environment. This study can be effectively applied to eLearning, Cyber-Class, and IETM

  • PDF

Two-Level Clausal Segmentation using Sense Information (의미 정보를 이용한 이단계 단문분할)

  • Park, Hyun-Jae;Woo, Yo-Seop
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.9
    • /
    • pp.2876-2884
    • /
    • 2000
  • Clausal segmentation is the method that parses Korean sentences by segmenting one long sentence into several phrases according to the predicates. So far most of researches could be useful for literary sentences, but long sentences increase complexities of the syntax analysis. Thus this paper proposed Two-Level Clausal Segmentation using sense information which was designed and implemented to solve this problem. Analysis of clausal segmentation and understanding of word senses can reduce syntactic and semantic ambiguity. Clausal segmentation using Sense Information is necessary because there are structural ambiguity of sentences and a frequent abbreviation of auxiliary word in common sentences. Two-Level Clausal Segmentation System(TLCSS) consists of Complement Selection Process(CSP) and Noncomplement Expansion Process(NEP). CSP matches sentence elements to subcategorization dictionary and noun thesaurus. As a result of this step, we can find the complement and subcategorization pattern. Secondly, NEP is the method that uses syntactic property and the others methods for noncomplement increase of growth. As a result of this step, we acquire segmented sentences. We present a technique to estimate the precision of Two-Level Clausal Segmentation System, and shows a result of Clausal Segmentation with 25,000 manually sense tagged corpus constructed by ETRl-KONAN group. An Two-Level Clausal Segmentation System shows clausal segmentation precision of 91.8%.

  • PDF

De re context and some semantic traits of 'rago' (대물(de re) 문맥과 '-라고'의 몇 가지 의미론적 특성)

  • Min, Chanhong
    • Korean Journal of Logic
    • /
    • v.16 no.1
    • /
    • pp.61-85
    • /
    • 2013
  • The author, after introducing the concept of de re belief and discussing de re/de dicto ambiguity in belief context and modal context, concludes that modal sentences of Korean language does not show any distinctive traits against English. He, after discussing this ambiguity in negative sentence a la Russell, tries to show that Korean provides two way of negation construction, one of which corresponds to de re negation (primary occurrence in Russell's terms). De re reading makes referentially transparent context, thus permits substitutions of identicals salva veritate; De dicto reading does not. Korean ending 'rago', used with quotation verbs, speech act verbs and cognitive attitude verbs, deserves some attention in that it permits de re sentences in addition to de re/de dicto ambiguous sentences. 'Rago' also makes speaker's commitment to the content of the intensionally contained clause 'neutral', in contrast with other Korean endings such as 'um/im' and 'raneun gut' which make speaker's positive commitment. This explains why the maxim of western epistemology that knowledge presupposes truth does not hold in Korean 'rago' sentences.

  • PDF

Hypergraph model based Scene Image Classification Method (하이퍼그래프 모델 기반의 장면 이미지 분류 기법)

  • Choi, Sun-Wook;Lee, Chong Ho
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.2
    • /
    • pp.166-172
    • /
    • 2014
  • Image classification is an important problem in computer vision. However, it is a very challenging problem due to the variability, ambiguity and scale change that exists in images. In this paper, we propose a method of a hypergraph based modeling can consider the higher-order relationships of semantic attributes of a scene image and apply it to a scene image classification. In order to generate the hypergraph optimized for specific scene category, we propose a novel search method based on a probabilistic subspace method and also propose a method to aggregate the expression values of the member semantic attributes that belongs to the searched subsets based on a linear transformation method via likelihood based estimation. To verify the superiority of the proposed method, we showed that the discrimination power of the feature vector generated by the proposed method is better than existing methods through experiments. And also, in a scene classification experiment, the proposed method shows a competitive classification performance compared with the conventional methods.

A Keyword Search Model based on the Collected Information of Web Users (웹 사용자 누적 사용정보 기반의 키워드 검색 모델)

  • Yoon, Sung-Hee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.4
    • /
    • pp.777-782
    • /
    • 2012
  • This paper proposes a technique for improving performance using word senses and user feedback in web information retrieval, compared with the retrieval based on ambiguous user query and index. Disambiguation using query word senses can eliminating the irrelevant pages from the search result. According to semantic categories of nouns which are used as index for retrieval, we build the word sense knowledge-base and categorize the web pages. It can improve the precision of retrieval system with user feedback deciding the query sense and information seeking behavior to pages.