• 제목/요약/키워드: Korean nouns

검색결과 232건 처리시간 0.027초

어휘의미망(U-WIN)을 이용한 동형이의어 접미사의 의미 중의성 해소 (Disambiguation of Homograph Suffixes using Lexical Semantic Network(U-WIN))

  • 배영준;옥철영
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제1권1호
    • /
    • pp.31-42
    • /
    • 2012
  • 현재까지 대부분의 한국어처리시스템에서는 가급적 많은 접미파생명사를 사전에 등재하여 처리하였다. 그러나 접미사는 생산성이 높기 때문에 모든 접미파생명사를 사전에 등재하는 것은 한계가 있다. 따라서 접미파생명사의 의미 분석을 통해서 미등재 접미파생명사를 분석할 필요가 있다. 본 논문에서는 접미파생명사의 의미 분석의 일환으로 한국어 어휘의미망(U-WIN)을 이용한 동형이의어 접미사의 중의성 해소 방법을 제시한다. 형태 의미 주석 세종 말뭉치에서 동형이의어 접미사를 포함한 33,104개의 접미파생명사를 대상으로 실험하였다. 실험을 위해 먼저 동형이의어 접미사를 의미 태깅하였으며, 접미사 앞의 어근을 추출하여 U-WIN의 노드에 매핑시켰다. 또한 동형이의어 접미사와 결합되는 U-WIN 상의 노드들에 대해 거리 가중치를 부여하여 이를 동형이의어 접미사 중의성 해소에 사용하였다. 동형이의어 접미사 49종 중 세종말뭉치에 나타난 35개의 동형이의어 접미사를 대상으로 실험한 결과 91.01%의 정확률을 보였다.

명사후문자열을 이용한 미등록어 인식 (Korean Unknown-noun Recognition using Strings Following Nouns in Words)

  • 박기탁;서영훈
    • 한국콘텐츠학회논문지
    • /
    • 제17권4호
    • /
    • pp.576-584
    • /
    • 2017
  • 사전에 등록되지 않은 미등록어는 형태소분석에서 뿐만 아니라 자연언어처리의 모든 분야에서 문제를 발생시킨다. 본 논문에서는 명사후문자열을 이용하여 미등록어를 인식하는 방법을 제안한다. 명사후문자열이란 명사를 포함하고 있는 어절에서 명사 뒤에 나오는 문자열을 의미하며, 조사, 접미사+조사, 동사화접미사+어미 등이 이에 속한다. 문서에 출현한 미등록어 포함 어절들을 모아 정렬한 다음, 동일한 앞부분을 가지는 어절이 두 개 이상일 경우에 한하여 미등록어 인식을 시도한다. 이 어절들에서 동일한 앞부분을 미등록 명사로, 그 다음 음절부터 끝 음절까지를 명사후문자열로 추정한다. 그리고 세종말뭉치에서 추출한 명사후문자열 정보를 이용하여 미등록 명사를 결정한다. 포털사이트 기사를 이용하여 실험한 결과, 2가지 형태 이상으로 출현한 미등록어에 대해 정확률 99.64%, 재현율 99.46%의 높은 인식 성능을 보였다.

한국어 동사와 명사사이의 하위범주화에 있어서의 평행성 (Parallels between Korean Verbs and Nouns in Subcategorization)

  • 노용균
    • 한국언어정보학회지:언어와정보
    • /
    • 제1권
    • /
    • pp.27-65
    • /
    • 1997
  • Nouns in the Korean language are subcategorized for various frames(called SUBCAT lists) in much the same way as verbs are. Assuming a monostratal grammar and building on analyses of various 'little elements' as clitics, such as the ones given by No(1991), Chae(1995,1996), and Oh(1991), I delineate the ranges of SUBCAT lists for the Korean verbs and nouns and show that the two word-classes have heavily overlapping frames. Twenty five SUBCAT lists are identified for verbs, and twenty four for nouns, of which twenty three find associated lexical items in both. By the way of justification, I offer analyses of noun--verb collocations in terms of the new five-valued syntactic feature COLLOC along with SUBCAT, which subsume 'light verb' constructions. It is hoped that this work will have given clear syntactic underpinnings to those who are concerned with practical lexicography.

  • PDF

Noun Sense Identification of Korean Nominal Compounds Based on Sentential Form Recovery

  • Yang, Seong-Il;Seo, Young-Ae;Kim, Young-Kil;Ra, Dong-Yul
    • ETRI Journal
    • /
    • 제32권5호
    • /
    • pp.740-749
    • /
    • 2010
  • In a machine translation system, word sense disambiguation has an essential role in the proper translation of words when the target word can be translated differently depending on the context. Previous research on sense identification has mostly focused on adjacent words as context information. Therefore, in the case of nominal compounds, sense tagging of unit nouns mainly depended on other nouns surrounding the target word. In this paper, we present a practical method for the sense tagging of Korean unit nouns in a nominal compound. To overcome the weakness of traditional methods regarding the data sparseness problem, the proposed method adopts complement-predicate relation knowledge that was constructed for machine translation systems. Our method is based on a sentential form recovery technique, which recognizes grammatical relationships between unit nouns. This technique makes use of the characteristics of Korean predicative nouns. To show that our method is effective on text in general domains, the experiments were performed on a test set randomly extracted from article titles in various newspaper sections.

통계적 정보를 이용한 복합명사 검색 모델 (A Compound Term Retrieval Model Using Statistical lnformation)

  • 박영찬;최기선
    • 인지과학
    • /
    • 제6권3호
    • /
    • pp.65-81
    • /
    • 1995
  • 복합명사는 한국어에서 가장 빈번하게 나타나는 색인어의 한 형태로서,영어권 중심의 정보검색 모델로는 다루기가 어려운 언어 현상의 하나이다.복합명사는 2개 이상의 단일어들의 조합으로 이루어져 있고,그 형태 또한 여러가지로 나타나기 때문에 색인과 검색의 큰 문제로 여겨져 왔다.본 논문에서는 복합명사의 어휘적 정보를 단위명사들의 통계적행태(statistical behavior)에 기반 하여 자동 획득하고,이러한 어휘적 정보를 검색에 적용하는 모델을 제시하고자 한다.본 방법은 색인시의 복합명사 인식의 어려움과 검색시의 형태의 다양성을 극복하는 모델로서 한국어를 포함한 동양권의 언어적 특징을 고려한 모델이다.

  • PDF

접사정보 및 선호패턴을 이용한 복합명사의 역방향 분해 알고리즘 (A Reverse Segmentation Algorithm of Compound Nouns Using Affix Information and Preference Pattern)

  • 류방;백현철;김상복
    • 한국멀티미디어학회논문지
    • /
    • 제7권3호
    • /
    • pp.418-426
    • /
    • 2004
  • 본 논문에서는 음절간 상호정 보를 이용하여 한국어 복합명사의 역방향 분해 알고리즘을 제 안한다. 한국어 복합명사는 그 구조가 한자어에 의해 파생 한것이 대부분이며 음절 상호간 선호 음절이 존재하므로, 이 정보와 접사정보를 복합명사의 분해규칙으로 이용한다. 성능을 평가하기 위해 36061개의 복합명사를 이용하여 본 논문에서 제안한 알고리즘의 분해한 결과 99.3%의 분해 정확율을 얻었다. 실험과 관련한 기존 알고리즘간의 비교에서도 우수한 결과를 얻었으며, 특히 4음절과 5음절 복합명사의 경우 대부분 정확한 분해 결과를 얻었다.

  • PDF

Exploring the Microscopic Textual Characteristics of Japanese Prime Ministers' Diet Addressesby Measuring the Quantity and Diversity of Nouns

  • Suzuki, Takafumi;Kageura, Kyo
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2007년도 정기학술대회
    • /
    • pp.459-470
    • /
    • 2007
  • This study explores the textual characteristics, more precisely the quantity and diversity of nouns, of Japanese prime ministers' Diet addresses. In the field of stylistics, textual characteristics independent of the content have been examined with the aim on detecting the authors, genres, and chronological variations of texts. This study focuses instead on textual characteristics related to the content of texts, namely the quantity and diversity of nouns, because our aim is to analyze texts to better understand two political phenomena: (a) the difference between the two types of Diet addresses delivered by Japanese prime ministers, and (b) the perceived changes made to these addresses by two powerful prime ministers. It is a case study of the microscopic characterization of texts, which has become more and more important with the expansion in the scope of stylistics and the production of a wide variety of new types of texts following the advent of the Web.

  • PDF

영어 복합명사의 강세형 (Stress Patterns of Compound Nouns in English)

  • 이영길
    • 대한음성학회지:말소리
    • /
    • 제42호
    • /
    • pp.25-36
    • /
    • 2001
  • Stress assignment has been much discussed in the literature on English compound nouns. The general view of the stress pattern of English compound nouns is that a main stress falls on the first element and a secondary stress on the second element; however, a stress pattern is often employed that provides counterevidence to the traditional pedagogical approach. A new idea is suggested by Ladd(1984) that 'compound stress represents the deaccenting of the head of the compound.' Recent studies show that initial stressing does not indicate compounds and syntactic phrases are not always characterized by final stressing. In his pilot test Pennanen comments on the frequent variation of stress patterns on individual items, on the basis of which Bauer confirms Pennanen's results with different informants. This paper is an attempt to justify Bauer's analysis with the same data as Bauer's and different subjects. It turns out that the competences of native-speaker informants do not rovide clear-cut answers. Some factors should be taken into account in assigning appropirate stress to compound nouns.

  • PDF

Effective Thematic Words Extraction from a Book using Compound Noun Phrase Synthesis Method

  • Ahn, Hee-Jeong;Kim, Kee-Won;Kim, Seung-Hoon
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권3호
    • /
    • pp.107-113
    • /
    • 2017
  • Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method of thematic words from book text by applying the compound noun and noun phrase synthetic method. The compound nouns represent the characteristics of a book in more detail than single nouns. The proposed method extracts the thematic word from book text by recognizing two types of noun phrases, such as a single noun and a compound noun combined with single nouns. The recognized single nouns, compound nouns, and noun phrases are calculated through TF-IDF weights and extracted as main words. In addition, this paper suggests a method to calculate the frequency of subject, object, and other roles separately, not just the sum of the frequencies of all nouns in the TF-IDF calculation method. Experiments is carried out in the field of economic management, and thematic word extraction verification is conducted through survey and book search. Thus, 9 out of the 10 experimental results used in this study indicate that the thematic word extracted by the proposed method is more effective in understanding the content. Also, it is confirmed that the thematic word extracted by the proposed method has a better book search result.

Processing Nominal Suffixes in Korean: Evidence from Priming Experiments

  • Ahn, Hee-Don;An, Duk-Ho;Choi, Jung-Yun;Hwang, Jong-Bai;Jeon, Moon-Gee;Kim, Ji-Hyon
    • 한국언어정보학회지:언어와정보
    • /
    • 제15권1호
    • /
    • pp.1-12
    • /
    • 2011
  • This study investigates morphologically complex nouns in Korean through a series of priming studies. Two experiments examined whether morphological affixes on Korean nouns were decomposed or processed as a whole. Two types of morphological affixes were examined: morpho-syntactic case markers and the plural marker '-tul'. Results showed that priming occurred for the plural marker with SOAs of 80 ms and 160 ms, but no priming occurred for the morpho-syntactic case markers. These results suggest that the morphological processing for these two types of affixes differ. We argue that Korean nouns with the plural suffix are decomposed into the stem and affix, supporting the Decomposition Model (Pinker & Ullman, 2002). We suggest that while plural markers are truly morphological affixes, case markers in Korean are morpho-syntactic, and thus presuppose the existence of other syntactic elements, such as the matrix verb, hence the lack of priming effects.

  • PDF