• Title/Summary/Keyword: WordNet

Search Result 256, Processing Time 0.031 seconds

Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary (베트남어 사전을 사용한 베트남어 SentiWordNet 구축)

  • Vu, Xuan-Son;Park, Seong-Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.745-748
    • /
    • 2014
  • SentiWordNet is an important lexical resource supporting sentiment analysis in opinion mining applications. In this paper, we propose a novel approach to construct a Vietnamese SentiWordNet (VSWN). SentiWordNet is typically generated from WordNet in which each synset has numerical scores to indicate its opinion polarities. Many previous studies obtained these scores by applying a machine learning method to WordNet. However, Vietnamese WordNet is not available unfortunately by the time of this paper. Therefore, we propose a method to construct VSWN from a Vietnamese dictionary, not from WordNet. We show the effectiveness of the proposed method by generating a VSWN with 39,561 synsets automatically. The method is experimentally tested with 266 synsets with aspect of positivity and negativity. It attains a competitive result compared with English SentiWordNet that is 0.066 and 0.052 differences for positivity and negativity sets respectively.

Eine methodische Betrachtung fur die Erstellung des koreanisch-deutschen WordNets (한독 워드넷 구축을 위한 기본 방법론 고찰)

  • Nam Yu-Sun
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.9
    • /
    • pp.217-236
    • /
    • 2004
  • Das Ziel dieser Arbeit ist es, als eine methodische Grundlage zur Erstellung des koreanisch-deutschen WordNets das Grundwissen $\"{u}ber$ das WordNet und einige bisherige Untersuchungen des WordNets darzulegen. Ais erster Schritt wurde einige grundlegende Punkte $f\"{u}r$ das WordNet im Rahmen des WordNets fur Englisch in Betracht gebracht. Dabei ging es um lexikalische Hierarchie, und um semantische Relationen zwischen den Synsets(Zusammensetzen der synonymen $W\"{o}rter$) wie Synonymy, Antonymy, Hyponymy, Mronymy, Troponomy und Entailment. $Anschlie{\ss}end$ wurden EuroNet und GermaNet in kurzer Form vorgestellt, die auf dem Princeton WordNet basierten. EuroNet ist eine multilinguale Datenbasis mit WordNets $f\"{u}r$ einige europaische Sprachen (hollandisch, italienisch, spanisch, deutsch, franzasisch, tschechisch und estnisch). Dieses auf das Deutsch bezogenen WordNet kann wichtige Hinweise $f\"{u}r$ die Erstellung des koreanisch-deutschen WordNets geben. In Korea wurden auch verschiedene Untersuchungen uber das WordNet $f\"{u}r$ Koreanisch unternommen. Darunter kann insbesondere KORTERM WordNet $f\"f{u}r$ Koreanisch als ein umfassendes System $erw\"{a}hnt$ werden, in dem Nomen, Verben, Adjektive und Adverbien miteinander interagieren. KORTERM WordNet fur Koreanisch ist eine multilinguale Datenbasis mit WordNets $f\"{u}r$ einige asiatische Sprachen (koreanisch, japanisch und chinesisch) und versucht noch die weiteren Sprachen in diese multilinguale Datenbasis hineinzubringen. Nach diesem WordNet wird das koreanisch-deutsche WordNet erstellt.

  • PDF

Alignment of Hypernym-Hyponym Noun Pairs between Korean and English, Based on the EuroWordNet Approach (유로워드넷 방식에 기반한 한국어와 영어의 명사 상하위어 정렬)

  • Kim, Dong-Sung
    • Language and Information
    • /
    • v.12 no.1
    • /
    • pp.27-65
    • /
    • 2008
  • This paper presents a set of methodologies for aligning hypernym-hyponym noun pairs between Korean and English, based on the EuroWordNet approach. Following the methods conducted in EuroWordNet, our approach makes extensive use of WordNet in four steps of the building process: 1) Monolingual dictionaries have been used to extract proper hypernym-hyponym noun pairs, 2) bilingual dictionary has converted the extracted pairs, 3) Word Net has been used as a backbone of alignment criteria, and 4) WordNet has been used to select the most similar pair among the candidates. The importance of this study lies not only on enriching semantic links between two languages, but also on integrating lexical resources based on a language specific and dependent structure. Our approaches are aimed at building an accurate and detailed lexical resource with proper measures rather than at fast development of generic one using NLP technique.

  • PDF

Eine methodologische Untersuchung der koreanisch-deutschen ILI-Verbindung zur Anwendung der auf dem EuroNet basierten lexikalisch-semantischen Datenbasis (유로워드넷 기반의 어휘 데이터베이스 활용을 위한 한국어-독일어 ILI 대응 방법론 연구)

  • Oh Jang-Geun
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.6
    • /
    • pp.323-344
    • /
    • 2002
  • EuroNet ist eine multilinguale Datenbasis mit WordNets $f\"{u}r\;einige\;europ\"{a}ische$ Sprachen ($holl\"{a}ndisch$, italienisch, spanisch, deutsch, $franz\"{o}sisch$, tschechisch und estnisch). Die WordNets werden genauso wie das amerikanische WordNet $f\"{u}r$ Englisch (Princeton WordNet, Miller et al. 1990) in Synsets (Zusammensetzen der synonymen $W\"{o}rter$) mit grundlegenden lexikalisch-semantischen Relationen zwischen ihnen $ausgedr\"{u}ckt$ strukturiert. Jedes WordNet stellt also ein einzigartiges innersprachliches System $f\"{u}r$ die lexikalischen und konzeptuellen Relationen dar. $Zus\"{a}tzlich$ werden diese auf dem Princeton WordNet basierten WordNets (z.B. GermaNet) mit einem Inter-Linguale-Index (kurz, ILI) verbunden. $\"{U}ber$ diesem Index werden die Sprachen zusammengeschaltet, damit zu gehen ist $m\"{o}glich$, von den $W\"{o}rtern$ in einer Sprache zu den $\"{a}hnlichen\;W\"{o}rtern$ in jeder $m\"{o}glicher$ anderen Sprache. Der Index gibt auch Zugang zu einer geteilten Top-Ontologie von 63 semantischen Unterscheidungen. Diese Top-Ontologie stellt einen allgemeinen semantischen Rahmen $f\"{u}r$ aile Sprachen zur $Verf\"{u}gung,\;w\"{a}hrend$ sprachspezifische Eigenschaften in den einzelnen WordNets beibehalten werden. Die Datenbasis kann, unter anderen, $f\"{u}r$ einsprachige und multilinguale Informationsretrieval benutzt werden. In der vorliegenden Arbeit handelt sich also um eine methodologische Untersuchung der koreanisch-deutschen ILI-Verbindung zur Anwendung der auf dem EuroNet basierten lexikalischen, semantischen Datenbasis. Dabei werden einzelnen Lexeme in koreanischen, deutschen WordNets $zun\"{a}chst$ mit Hilfe der Sense-Analyse semantisch differenziert, und dann durch lexikalische und konzeptuelle Relationen(ILI) miteinander verbunden. Die Equivalezverbindungen dienen, sprachspezifische Konzepte zum ILI abzubilden. Sie werden von einem anderen Synset der moglichen Relationen aus der Euronet-Spezifikation genommen. Wenn es keinen ILI-Rekord gibt, der ein direktes Equivalenz zu einem gegebenen Konzept darstellt, kann das Konzept in der Frage $\"{u}ber$ EQ-Near-Synonymie, EQ-Hyperonymie oder EQ-Hyponymie Relationen verbunden werden.

  • PDF

Construction of Korean WordNet (한국어 워드넷의 구축)

  • Lim, Sung-Shin;Lee, Eun-Ryoung;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 2004.10d
    • /
    • pp.106-111
    • /
    • 2004
  • 사람의 언어를 이해하는 자연언어처리 시스템을 개발하기 위해서는 의미처리를 위한 지식 베이스(knowledge base)가 필요하다. 지금까지 사람이 가진 지식 베이스를 컴퓨터에 도입하려는 많은 노력을 기울이고 있고 그 결과물로 온톨로지(ontology)와 시소러스(thesaurus)가 만들어지고 있다. 외국에서는 지식 베이스의 중요성을 알고 많은 연구를 수행하고 있으며 그 대표적인 사례들에는 Roget's Thesaurus, WordNet, EDR 개념사전, CYC, Euro WordNet 등이 있다. 이 중에서 가장 대표적이며 많은 활용을 보이는 것이 Princeton 대학의 WordNet이다. WordNet은 인간의 어휘지식에 대한 심리 언어학적인 연구의 결과물로써 심리학자와 언어학자들에 의해 10여 년 동안 구축되고 있는 영어에 대한 어휘데이터베이스이다. 본 논문에서는 WordNet을 기반으로 명사에 대해서 영한사전과 국어사전을 이용하여 구축한 한국어 워드넷을 소개하구 구축시 고려한 기본지침을 소개하도록 하겠다.

  • PDF

The Automatic Extraction of Hypernyms and the Development of WordNet Prototype for Korean Nouns using Korean MRD (Machine Readable Dictionary) (국어사전을 이용한 한국어 명사에 대한 상위어 자동 추출 및 WordNet의 프로토타입 개발)

  • Kim, Min-Soo;Kim, Tae-Yeon;Noh, Bong-Nam
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.847-856
    • /
    • 1995
  • When a human recognizes nouns in a sentence, s/he associates them with the hyper concepts of onus. For computer to simulate the human's word recognition, it should build the knowledge base (WordNet)for the hyper concepts of words. Until now, works for the WordNet haven't been performed in Korea, because they need lots of human efforts and time. But, as the power of computer is radically improved and common MRD becomes available, it is more feasible to automatically construct the WordNet. This paper proposes the method that automatically builds the WordNet of Korean nouns by using the descripti on of onus in Korean MRD, and it proposes the rules for extracting the hyper concepts (hypernyms)by analyzing structrual characteristics of Korean. The rules effect such characteristics as a headword lies on the rear part of sentences and the descriptive sentences of nouns have special structure. In addition, the WordNet prototype of Korean Nouns is developed, which is made by combining the hypernyms produced by the rules mentioned above. It extracts the hypernyms of about 2,500 sample words, and the result shows that about 92per cents of hypernyms are correct.

  • PDF

Using WordNet for the Automatic Construction of Korean Thesaurus (WordNet을 이용한 한국어 시소러스 자동 구축)

  • Lee, Chang-Ki;Lee, Geun-Bae
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.156-163
    • /
    • 1999
  • 최근의 자연어 처리 분야의 연구들에서 광범위하고 완전한 어휘 지식 베이스의 필요성이 입증되었다. 영어권의 경우, 이에 대한 연구가 오래 전부터 있어 왔고, 그 결과로 현재 주로 사용되고 있는 개념체계에는 Roget's Thesaurus와 WordNet 등이 있다. 이러한 개념체계들은 자연어 처리의 여러 응용 분야에서 중요한 역할을 담담하고 있지만, 다른 언어의 경우 널리 사용되고 있는 개념체계가 없는 실정이다. 본 논문에서는 Princeton 대학의 WordNet을 기반으로 한영 사전과 국어 사전을 이용하여 한국어 명사의 개념체계를 자동으로 구축함으로써, 이미 구축되어진 다른 언어의 개념체계를 이용하여 새로운 언어의 개념체계를 자동으로 구축할 수 있음을 보인다. 먼저 한영 사전과 국어 사전으로부터 뽑아낸 한국어 단어 일부의 의미를 다양한 WSD(Word Sense Disambiguation) 방법을 적용시켜 WordNet의 synset에 자동으로 연결시킬 수 있음을 보인다. 그리고 각각의 자동변환으로 나온 결과들에 대해서 적용율과 정확도를 비교하도록 한다.

  • PDF

WellnessWordNet: A Word Net for Unconstrained Subjective Well-Being Monitor ing Based on Unstructured Data and Contextual Polarity (웰니스워드넷: 비정형데이터와 상황적 긍부정성에 기반하여 주관적 웰빙 상태를 무구속적으로 모니터링하기 위한 워드넷 개발)

  • Song, Yeongeun;Nam, Suhyun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.1-21
    • /
    • 2016
  • IT-based subjective well-being (SWB) services, a main part of wellness IT, should measure the SWB state of individuals in an unrestrained, cost-effective manner. The dictionaries for sentiment analysis available in the market may be useful for this purpose, but obtaining proper sentiment values using only words from the sentiment lexicon is impossible; therefore, a new dictionary including wellness vocabulary is needed. The existing sentiment dictionaries link only a single sentiment value to a single sentiment word, although sentiment values may vary depending on personal traits. In this study, we develop an extended version of the SenticNet sentiment dictionary dubbed WellnessWordNet. SenticNet is considered the best and most expressive among the already existing sentiment dictionaries. Using the information provided by SenticNet, we created a database including the wellness states (estimated values) of stress, depression, and anger to develop the WellnessWordNet system. The accuracy of the system was validated through actual tests with live subjects. This study is unique and unprecedented in that i) an extended sentiment dictionary, WellnessWordNet, is developed; ii) values for wellness state language are offered; and iii) different sentiment values, namely contextual polarity, for people of the same gender or age group are suggested.

Open Korean WordNet (KWN): Dictionary-based Semi-Automatic Development (한국어 오픈 워드넷 (KWN) : 사전 기반의 반자동 구축)

  • Lee, In Keun;Hwang, Dosam;Hahm, Younggyun;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.193-196
    • /
    • 2014
  • 본 논문에서는 사전자원에 기반한 한국어 워드넷(Open Korean WordNet: KWN)의 반자동 구축 방법을 제안한다. 제안한 방법에서는 각 전문분야별로 분류된 영어-한국어 대역사전, 일본어-한국어 대역사전을 이용하여 영어 워드넷(Princeton WordNet 3.0)과 일본어 워드넷(Japanese WordNet 1.1)의 어휘를 번역하였다. 그리고 번역 결과의 애매성을 해소하기 위하여, (1)영어와 일본어에 대한 한국어 대역어의 중복 여부, (2)사전의 분야 정보와 워드넷의 계층구조를 고려하였다. 제안한 방법으로 117,659 개의 워드넷 synset 중 63,221 개(약 54 %)의 synset에 대한 자동번역을 수행하여 한국어 워드넷을 구축하였다. 그리고 워드넷 synset의 정의문은 한국어 사전의 정의문을 참조하여 한글화 할 수 있도록 하고, 이 과정을 지원하기 위한 정의문 추천 알고리즘을 제안한다. 제안한 방법에 기반하여 전문가들이 상호 협력하여 한국어 워드넷을 구축할 수 있는 시스템을 개발한다.

  • PDF

Network Analysis between Uncertainty Words based on Word2Vec and WordNet (Word2Vec과 WordNet 기반 불확실성 단어 간의 네트워크 분석에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.247-271
    • /
    • 2019
  • Uncertainty in scientific knowledge means an uncertain state where propositions are neither true or false at present. The existing studies have analyzed the propositions written in the academic literature, and have conducted the performance evaluation based on the rule based and machine learning based approaches by using the corpus. Although they recognized that the importance of word construction, there are insufficient attempts to expand the word by analyzing the meaning of uncertainty words. On the other hand, studies for analyzing the structure of networks by using bibliometrics and text mining techniques are widely used as methods for understanding intellectual structure and relationship in various disciplines. Therefore, in this study, semantic relations were analyzed by applying Word2Vec to existing uncertainty words. In addition, WordNet, which is an English vocabulary database and thesaurus, was applied to perform a network analysis based on hypernyms, hyponyms, and synonyms relations linked to uncertainty words. The semantic and lexical relationships of uncertainty words were structurally identified. As a result, we identified the possibility of automatically expanding uncertainty words.