Search | Korea Science

Design and Implementation of a System for Constructing Thesaurus of Korean Nouns (한국어 명사의 시소러스 구축을 위한 시스템 설계 및 구현)

Lee, Jong-In;Han, Gwang-Rok;Yang, Seung-Hyeon;Kim, Yeong-Seom
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.2
- /
- pp.347-356
- /
- 1999
We present a method of thesaurus construction in order to produce semantic concept hierarchy of Korean nouns and implement a system for constructing the thesaurus in this paper. Multiple-step construction method is applied to this system which uses bottom-up and top-down method complementarily for solving the non-objectivity of word hierarchy, working speed, structural contradiction and incoherency of existing thesaurus. This system maintains objectivity using the meaning sentence of machine-readable dictionary and solves structural contradiction and incoherency with reference to existing thesaurus. We implement a developmental tool based on client/server system to construct thesaurus including massive entries as soon as possible and multiple client can work simultaneously.
PDF

Generation of Compound Nouns Using Automatic Constructed Syntactic Patterns and Semantic Network (자동 생성한 구문패턴과 의미망을 이용한 복합명사 생성)

Im, Ji-Hui;Choe, Ho-Seop;Ock, Cheol-Young
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.05a
- /
- pp.655-658
- /
- 2004
본 논문은 구문패턴과 의미망을 이용하여 복합명사를 생성하는 방법을 제안하고, 제안된 방법으로 설계 구현된 시스템, 복합명사 생성기(Compound Nouns Generator : CNG)를 제시한다. 복합명사에 관한 연구는 형태적, 구문적, 의미적인 관점에서의 분석과 생성에 이르기까지 폭넓게 진행되고 있다. 본 논문에서는, 사전에 등재된 표제어만을 복합명사로 인정하고, 나머지는 1차적으로 명사 연결구로 파악한다. 그리고 이것을 다시 신형 복합명사와 명사 연결구로 파악함으로써, 복합명사에 대한 명확한 기준을 제시하여 자연언어처리, 정보검색 등에서 효율성을 높이고자 하였다. 본 논문에서 제안한 시스템은 복합명사 확장을 위해 구문패턴을 자동 생성함으로써 시스템의 융통성을 향상시키고, 구문패턴과 의미망을 통해 생성된 복합명사와 명사 연결구의 말뭉치상의 빈도정보를 이용함으로써 고빈도의 명사 연결구가 복합명사로서 굳어질 수 있는 가능성을 보여주기도 한다. 또한, 구문패턴과 의미망을 통해 생성된 명사 연결구를 비교하여 생성된 신형 복합명사는 구문패턴에 의해 뜻풀이가 자동으로 생성된다.
PDF

A Study on Definition Related to Passive and Causative in Korean Dictionary. (피동·사동과 관련한 국어사전의 뜻풀이에 대하여)

CHOE, Ho Chol
- Korean Linguistics
- /
- v.48
- /
- pp.333-354
- /
- 2010
When defining the word related to passive and causative in Korean dictionary, the meaning of headword can be explained by linking them to other related words. The link could be expressed into two forms; the one is 'passive verb causative verb of A' and the other is 'passive form causative form of A.' Whichever the dictionary takes, the important thing is that the content to which it refer should be correct. However the format of 'passive verb causative verb of A' and 'passive form causative form of A' is problematic because the definition of headword does not contain semantic information but syntactic or morphological information. Generic concept 'passive form causative form' and 'passive verb causative verb' refers to respectively morphological and syntactic level but specific concept 'A' refers to semantic level. These morphological, syntactic and semantic level can not be a same dimension so the size of their denotation can not be compared. The way of transform syntactic dimension 'passive verb causative verb' and morphological dimension 'passive form causative form' into semantic dimension is removing 'verb' and 'form' from 'passive verb causative verb' and 'passive form causative form' respectively. Therefore the expression 'passive verb causative verb of A' or 'passive form causative form of A' ought be changed into 'passive causative of A.'

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
- Journal of Intelligence and Information Systems
- /
- v.24 no.4
- /
- pp.219-240
- /
- 2018
Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.
https://doi.org/10.13088/jiis.2018.24.4.219 인용 PDF KSCI HTML

A Word Embedding used Word Sense and Feature Mirror Model (단어 의미와 자질 거울 모델을 이용한 단어 임베딩)

Lee, JuSang;Shin, JoonChoul;Ock, CheolYoung
- KIISE Transactions on Computing Practices
- /
- v.23 no.4
- /
- pp.226-231
- /
- 2017
Word representation, an important area in natural language processing(NLP) used machine learning, is a method that represents a word not by text but by distinguishable symbol. Existing word embedding employed a large number of corpora to ensure that words are positioned nearby within text. However corpus-based word embedding needs several corpora because of the frequency of word occurrence and increased number of words. In this paper word embedding is done using dictionary definitions and semantic relationship information(hypernyms and antonyms). Words are trained using the feature mirror model(FMM), a modified Skip-Gram(Word2Vec). Sense similar words have similar vector. Furthermore, it was possible to distinguish vectors of antonym words.
https://doi.org/10.5626/KTCP.2017.23.4.226 인용 KSCI

Korean Compound Noun Decomposition and Semantic Tagging System using User-Word Intelligent Network (U-WIN을 이용한 한국어 복합명사 분해 및 의미태깅 시스템)

Lee, Yong-Hoon;Ock, Cheol-Young;Lee, Eung-Bong
- The KIPS Transactions:PartB
- /
- v.19B no.1
- /
- pp.63-76
- /
- 2012
We propose a Korean compound noun semantic tagging system using statistical compound noun decomposition and semantic relation information extracted from a lexical semantic network(U-WIN) and dictionary definitions. The system consists of three phases including compound noun decomposition, semantic constraint, and semantic tagging. In compound noun decomposition, best candidates are selected using noun location frequencies extracted from a Sejong corpus, and re-decomposes noun for semantic constraint and restores foreign nouns. The semantic constraints phase finds possible semantic combinations by using origin information in dictionary and Naive Bayes Classifier, in order to decrease the computation time and increase the accuracy of semantic tagging. The semantic tagging phase calculates the semantic similarity between decomposed nouns and decides the semantic tags. We have constructed 40,717 experimental compound nouns data set from Standard Korean Language Dictionary, which consists of more than 3 characters and is semantically tagged. From the experiments, the accuracy of compound noun decomposition is 99.26%, and the accuracy of semantic tagging is 95.38% respectively.
https://doi.org/10.3745/KIPSTB.2012.19B.1.063 인용 PDF KSCI

Automatic Construction of Syntactic Relation in U-WIN (U-WIN의 구문관계 자동구축 방법)

Im, Jihui;Kim, Dongmyoung;Choe, Hoseop;Yoon, Hwa-Mook;Ock, Cheolyoung
- Annual Conference on Human and Language Technology
- /
- 2007.10a
- /
- pp.84-90
- /
- 2007
일반적인 어휘망이 의미 관계에 의한 연결 구조를 중심으로 연구 개발된 것과는 달리, U-WIN은 의미관계를 비롯하여 개념 관계, 형태 관계, 구문 관계 등과 같이 의미 관계의 범위를 확장한 어휘 관계를 적용하여 구축하고 있다. 본 연구에서는 U-WIN의 어휘 관계 중의 하나인 구문관계를 자동으로 구축하는 방법을 제시하고자 한다. 먼저, 용언의 용례에서 문형정보를 기준으로 구문관계를 형성할 수 있는 후보명사를 추출하였으며, 추출한 후보명사는 용언의 세분화된 의미별로 정확하고 다양하게 추출할 수 있었다. 그러나 U-WIN은 다의어의 뜻풀이 하나하나를 개별적인 어휘로 구분하여 구축하였으므로, 어휘 간의 구문관계를 설정하기 위해서는 후보명사의 여러 의미 중에서 하나의 의미로 결정해야 한다. 그래서 본 연구에서는 용례 매칭 규칙, 구문패턴, 의미 유사도 등을 차례로 적용하여 후보명사의 의미를 분별하였으며, 또한 구문패턴의 빈도 정보를 이용하여 용례에 나타나지 않지만 구문관계를 형성할 수 있는 명사를 추출하여 구문관계를 확장하고자 하였다. 이러한 연구는 명사 중심의 어휘망이 용언과의 구문관계 구축을 통해 형태소 분석, 구문 분석, 의미 분석 등에 광범위하게 활용할 수 있는 어휘망의 기반을 다지는 작업이 될 수 있을 것이다.
PDF

Development and Application of Robot Contents for Symbolic Vocal Language Learning of Young Children (상징적 음성언어 교육을 위한 유아 로봇 콘텐츠 개발 및 적용)

Kim, Jeong-Ho;Han, Jeong-Hye;Kim, Dong-Ho
- Journal of The Korean Association of Information Education
- /
- v.13 no.2
- /
- pp.205-214
- /
- 2009
The vocal language which is a symbolic vocal language described external sounds or expressed shape of things in nature, phenomenon, movement, shape of state provided images which can be envisioned in minds and created the mood for the whole writings. As the instructive ways of symbolic vocal language, the activities which refrain one-way translation for lexicon definition and stimulate the thoughts of students and interesting activities such as songs and comic books which students can understand by themselves are needed. Therefore, in this study, these symbolic vocal language is to be developed as the contents of robot for Symbolic Vocal language learning activities and after study activities, the possibility of using robot for education is to be reviewed, comparing changes in definitive areas and achievement after study activities. After the Symbolic Vocal language learning activities using robot and computer, as results of testing three achievement types of words simulated sound, shape, and movement, in study on words simulating sound and shape there was no significant difference. But The study activities simulating words used robot showed significant difference in terms of interest, confidence, and understanding.
PDF

Analysis of High School Students' Understanding Levels about Earth Science terms Written in Chinese Characters (한자로 된 지구과학 용어에 대한 고등학생의 이해 수준)

Jeong, Jin-Woo;Park, Hee-Moo;Jung, Jae-Gu
- Journal of the Korean earth science society
- /
- v.25 no.5
- /
- pp.303-314
- /
- 2004
The purpose of this study is to analyze high school students' understanding levels about earth science terms written in Chinese characters according to learners' characteristics. In order to investigate how the responses vary according to their characteristics, first of all, proper scientific terms are selected, and then corre sponding questions about them are offered as subjects, which consists of a Korean characters type, a picture type, and a Korean and Chinese characters type. During paper test and interview, the questions are given to fifteen students from general high schools in Jecheon, Chungbuk Province. The results of the study are as follows; Students in formal operation level and field-independency answered very well and also the terms of Chinese characters type and picture type improve the students' understanding and memorization. Generally, Chinese characters have more positive influence on their learning than the negative. Therefore, in general, it seems that it's more effective to explain the sound and meaning of terms in detail and to give learners enough time to draw a picture about each term by themselves when scientific terms are delivered to students in Chinese characters in class.
PDF KSCI

A Korean Homonym Disambiguation System Using Refined Semantic Information and Thesaurus (정제된 의미정보와 시소러스를 이용한 동형이의어 분별 시스템)

Kim Jun-Su;Ock Cheol-Young
- The KIPS Transactions:PartB
- /
- v.12B no.7 s.103
- /
- pp.829-840
- /
- 2005
Word Sense Disambiguation(WSD) is one of the most difficult problem in Korean information processing. We propose a WSD model with the capability to filter semantic information using the specific characteristics in dictionary dictions, and nth added information, useful to sense determination, such as statistical, distance and case information. we propose a model, which can resolve the issues resulting from the scarcity of semantic information data based on the word hierarchy system (thesaurus) developed by Ulsan University's UOU Word Intelligent Network, a dictionary-based toxicological database. Among the WSD models elaborated by this study, the one using statistical information, distance and case information along with the thesaurus (hereinafter referred to as 'SDJ-X model') performed the best. In an experiment conducted on the sense-tagged corpus consisting of 1,500,000 eojeols, provided by the Sejong project, the SDJ-X model recorded improvements over the maximum frequency word sense determination (maximum frequency determination, MFC, accuracy baseline) of $18.87\%$ ($21.73\%$ for nouns and inter-eojeot distance weights by $10.49\%$ ($8.84\%$ for nouns, $11.51\%$ for verbs). Finally, the accuracy level of the SDJ-X model was higher than that recorded by the model using only statistical information, distance and case information, without the thesaurus by a margin of $6.12\%$ ($5.29\%$ for nouns, $6.64\%$ for verbs).
https://doi.org/10.3745/KIPSTB.2005.12B.7.829 인용 PDF KSCI

Search Result 51, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)