• Title/Summary/Keyword: Korean Standard Unabridged Dictionary

Search Result 3, Processing Time 0.016 seconds

A Comparative Study of Mathematical Terms in Korean Standard Unabridged Dictionary and the Editing Material (표준국어대사전과 편수자료의 수학 용어 비교 조사)

  • Her, Min
    • Journal for History of Mathematics
    • /
    • v.33 no.4
    • /
    • pp.237-257
    • /
    • 2020
  • In this paper, we classify the mathematical terms in Korean Standard Unabridged Dictionary into four groups; ① group 1 consists of the terms which coincide with the mathematical terms in the 2015 Editing Material, ② group 2 consists of the terms which are synonyms or old terms or inflection forms of the mathematical terms in the Editing Material, ③ group 3 consists of the terms which do not belong to group 1 or group 2, but relate to the elementary or secondary school mathematics, ④ group 4 consists of the terms which do not relate to the elementary or secondary school mathematics. And then we make a comparative study with the mathematical terms in the Editing Material. In this study, we find out the mathematical terms in the Editing Material, but not in Korean Standard Unabridged Dictionary. And by using synonyms and old terms of the mathematical terms in the Editing Material we guess the rough tendency which terms belong to the Editing Material. By investigating the terms in group 3 and 4, we find out the mathematical terms which may belong to the Editing Material. We also find out the wrong or inconsistent explanations in Korean Standard Unabridged Dictionary.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Selection of Korean General Vocabulary for Machine Readable Dictionaries (자연언어처리용 전자사전을 위한 한국어 기본어휘 선정)

  • 배희숙;이주호;시정곤;최기선
    • Language and Information
    • /
    • v.7 no.1
    • /
    • pp.41-54
    • /
    • 2003
  • According to Jeong Ho-seong (1999), Koreans use an average of only 20% of the 508,771 entries of the Korean standard unabridged dictionary. To establish MRD for natural language processing, it is necessary to select Korean lexical units that are used frequently and are considered as basic words. In this study, this selection process is done semi-automatically using the KAIST large corpus. Among about 220,000 morphemes extracted from the corpus of 40,000,000 eojeols, 50,637 morphemes (54,797 senses) are selected. In addition, the coverage of these morphemes in various texts is examined with two sub-corpora of different styles. The total coverage is 91.21 % in formal style and 93.24% in informal style. The coverage of 6,130 first degree morphemes is 73.64% and 81.45%, respectively.

  • PDF