A Korean Homonym Disambiguation System Based on Statistical, Model Using weights

Kim, Jun-Su;Lee, Wang-Woo;Kim, Chang-Hwan;Ock, Cheol-young;

Proceedings of the Korean Society for Language and Information Conference (한국언어정보학회:학술대회논문집)

2002.02a
/
Pages.166-176
/
2002

Korean Society for Language and Information (한국언어정보학회)

A Korean Homonym Disambiguation System Based on Statistical, Model Using weights

Kim, Jun-Su (Dept. of Computer Engineering & Information Technology, University of Ulsan, 680-749, San29, Mugeo-dong, Nam-gu, Ulsan) ;
Lee, Wang-Woo (Dept. of Computer Engineering & Information Technology, University of Ulsan, 680-749, San29, Mugeo-dong, Nam-gu, Ulsan) ;
Kim, Chang-Hwan (Dept. of Computer Engineering & Information Technology, University of Ulsan, 680-749, San29, Mugeo-dong, Nam-gu, Ulsan) ;
Ock, Cheol-young (Dept. of Computer Engineering & Information Technology, University of Ulsan, 680-749, San29, Mugeo-dong, Nam-gu, Ulsan)

Published : 2002.02.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

A homonym could be disambiguated by another words in the context as nouns, predicates used with the homonym. This paper using semantic information (co-occurrence data) obtained from definitions of part of speech (POS) tagged UMRD-S$^1$), In this research, we have analyzed the result of an experiment on a homonym disambiguation system based on statistical model, to which Bayes'theorem is applied, and suggested a model established of the weight of sense rate and the weight of distance to the adjacent words to improve the accuracy. The result of applying the homonym disambiguation system using semantic information to disambiguating homonyms appearing on the dictionary definition sentences showed average accuracy of 98.32% with regard to the most frequent 200 homonyms. We selected 49 (31 substantives and 18 predicates) out of the 200 homonyms that were used in the experiment, and performed an experiment on 50,703 sentences extracted from Sejong Project tagged corpus (i.e. a corpus of morphologically analyzed words) of 3.5 million words that includes one of the 49 homonyms. The result of experimenting by assigning the weight of sense rate(prior probability) and the weight of distance concerning the 5 words at the front/behind the homonym to be disambiguated showed better accuracy than disambiguation systems based on existing statistical models by 2.93%,

Proceedings of the Korean Society for Language and Information Conference (한국언어정보학회:학술대회논문집)

A Korean Homonym Disambiguation System Based on Statistical, Model Using weights

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)