• Title/Summary/Keyword: Lexicon Analysis

Search Result 91, Processing Time 0.025 seconds

A Study on Lexicon Integrated Convolutional Neural Networks for Sentiment Analysis (감성 분석을 위한 어휘 통합 합성곱 신경망에 관한 연구)

  • Yoon, Joo-Sung;Kim, Hyeon-Cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.916-919
    • /
    • 2017
  • 최근 딥러닝의 발달로 인해 Sentiment analysis분야에서도 다양한 기법들이 적용되고 있다. 이미지, 음성인식 분야에서 높은 성능을 보여주었던 Convolutional Neural Networks (CNN)은 최근 자연어처리 분야에서도 활발하게 연구가 진행되고 있으며 Sentiment analysis에도 효과적인 것으로 알려져 있다. 기존의 머신러닝에서는 lexicon을 이용한 기법들이 활발하게 연구되었지만 word embedding이 등장하면서 이러한 시도가 점차 줄어들게 되었다. 그러나 lexicon은 여전히 sentiment analysis에서 유용한 정보를 제공한다. 본 연구에서는 SemEval 2017 Task4에서 제공한 Twitter dataset과 다양한 lexicon corpus를 사용하여 lexicon을 CNN과 결합하였을 때 모델의 성능이 얼마큼 향상되는지에 대하여 연구하였다. 또한 word embedding과 lexicon이 미치는 영향에 대하여 분석하였다. 모델을 평가하는 metric은 positive, negative, neutral 3가지 class에 대한 macroaveraged F1 score를 사용하였다.

Generating a Korean Sentiment Lexicon Through Sentiment Score Propagation (감정점수의 전파를 통한 한국어 감정사전 생성)

  • Park, Ho-Min;Kim, Chang-Hyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.2
    • /
    • pp.53-60
    • /
    • 2020
  • Sentiment analysis is the automated process of understanding attitudes and opinions about a given topic from written or spoken text. One of the sentiment analysis approaches is a dictionary-based approach, in which a sentiment dictionary plays an much important role. In this paper, we propose a method to automatically generate Korean sentiment lexicon from the well-known English sentiment lexicon called VADER (Valence Aware Dictionary and sEntiment Reasoner). The proposed method consists of three steps. The first step is to build a Korean-English bilingual lexicon using a Korean-English parallel corpus. The bilingual lexicon is a set of pairs between VADER sentiment words and Korean morphemes as candidates of Korean sentiment words. The second step is to construct a bilingual words graph using the bilingual lexicon. The third step is to run the label propagation algorithm throughout the bilingual graph. Finally a new Korean sentiment lexicon is generated by repeatedly applying the propagation algorithm until the values of all vertices converge. Empirically, the dictionary-based sentiment classifier using the Korean sentiment lexicon outperforms machine learning-based approaches on the KMU sentiment corpus and the Naver sentiment corpus. In the future, we will apply the proposed approach to generate multilingual sentiment lexica.

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

  • Lee, Donghyun;Park, Jae-Hyun;Kim, Kwang-Ho;Park, Jeong-Sik;Kim, Ji-Hwan;Jang, Gil-Jin;Park, Unsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4264-4279
    • /
    • 2017
  • In this paper, maximum likelihood-based automatic lexicon generation using mixed-syllables is proposed for unlimited vocabulary voice interface for East Asian languages (e.g. Korean, Chinese and Japanese) in AI-assistant based interaction with mobile devices. The conventional lexicon has two inevitable problems: 1) a tedious repetition of out-of-lexicon unit additions to the lexicon, and 2) the propagation of errors during a morpheme analysis and space segmentation. The proposed method provides an automatic framework to solve the above problems. The proposed method produces a level of overall accuracy similar to one of previous methods in the presence of one out-of-lexicon word in a sentence, but the proposed method provides superior results with the absolute improvements of 1.62%, 5.58%, and 10.09% in terms of word accuracy when the number of out-of-lexicon words in a sentence was two, three and four, respectively.

Text Mining and Sentiment Analysis for Predicting Box Office Success

  • Kim, Yoosin;Kang, Mingon;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.4090-4102
    • /
    • 2018
  • After emerging online communications, text mining and sentiment analysis has been frequently applied into analyzing electronic word-of-mouth. This study aims to develop a domain-specific lexicon of sentiment analysis to predict box office success in Korea film market and validate the feasibility of the lexicon. Natural language processing, a machine learning algorithm, and a lexicon-based sentiment classification method are employed. To create a movie domain sentiment lexicon, 233,631 reviews of 147 movies with popularity ratings is collected by a XML crawling package in R program. We accomplished 81.69% accuracy in sentiment classification by the Korean sentiment dictionary including 706 negative words and 617 positive words. The result showed a stronger positive relationship with box office success and consumers' sentiment as well as a significant positive effect in the linear regression for the predicting model. In addition, it reveals emotion in the user-generated content can be a more accurate clue to predict business success.

Romanian-Lexicon-Based Sentiment Analysis for Assesing Teachers' Activity

  • Barila, Adina;Danubianu, Mirela;Gradinaru, Bogdanel
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.43-50
    • /
    • 2022
  • The students' feedback is important to measure and improve teaching performance. Many teacher performance evaluation systems are based on responses to closed question, but the free text answers can contain useful information which had to be explored. In this paper we present a lexicon-based sentiment analysis to explore students' text feedback. The data was collected from a system for the evaluation of teachers by students developed and used in our university. The students comments are in Romanian language so we built a Romanian sentiment word lexicon. We used this to categorize the feeback text as positive, negative or neutral. In addition, we added a new polarity - indifferent - in order to categorize blank and "I don't answer" responses.

Extracting Multiword Sentiment Expressions by Using a Domain-Specific Corpus and a Seed Lexicon

  • Lee, Kong-Joo;Kim, Jee-Eun;Yun, Bo-Hyun
    • ETRI Journal
    • /
    • v.35 no.5
    • /
    • pp.838-848
    • /
    • 2013
  • This paper presents a novel approach to automatically generate Korean multiword sentiment expressions by using a seed sentiment lexicon and a large-scale domain-specific corpus. A multiword sentiment expression consists of a seed sentiment word and its contextual words occurring adjacent to the seed word. The multiword sentiment expressions that are the focus of our study have a different polarity from that of the seed sentiment word. The automatically extracted multiword sentiment expressions show that 1) the contextual words should be defined as a part of a multiword sentiment expression in addition to their corresponding seed sentiment word, 2) the identified multiword sentiment expressions contain various indicators for polarity shift that have rarely been recognized before, and 3) the newly recognized shifters contribute to assigning a more accurate polarity value. The empirical result shows that the proposed approach achieves improved performance of the sentiment analysis system that uses an automatically generated lexicon.

Method for Spatial Sentiment Lexicon Construction using Korean Place Reviews (한국어 장소 리뷰를 이용한 공간 감성어 사전 구축 방법)

  • Lee, Young Min;Kwon, Pil;Yu, Ki Yun;Kim, Ji Young
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.2
    • /
    • pp.3-12
    • /
    • 2017
  • Leaving positive or negative comments of places where he or she visits on location-based services is being common in daily life. The sentiment analysis of place reviews written by actual visitors can provide valuable information to potential consumers, as well as business owners. To conduct sentiment analysis of a place, a spatial sentiment lexicon that can be used as a criterion is required; yet, lexicon of spatial sentiment words has not been constructed. Therefore, this study suggested a method to construct a spatial sentiment lexicon by analyzing the place review data written by Korean internet users. Among several location categories, theme parks were chosen for this study. For this purpose, natural language processing technique and statistical techniques are used. Spatial sentiment words included the lexicon have information about sentiment polarity and probability score. The spatial sentiment lexicon constructed in this study consists of 3 tables(SSLex_SS, SSLex_single, SSLex_combi) that include 219 spatial sentiment words. Throughout this study, the sentiment analysis has conducted based on the texts written about the theme parks created on Twitter. As the accuracy of the sentiment classification was calculated as 0.714, the validity of the lexicon was verified.

A Study on the Analysis of Disaster Safety Lexicon Patterns in Social Media (소셜미디어를 통해 본 재난안전 분야 어휘 사용 양상 분석)

  • Kim, Tae-Young;Lee, Jung-Eun;Oh, Hyo-Jung
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.10
    • /
    • pp.85-93
    • /
    • 2017
  • Standardization of disaster safety lexicon is important as the most basic process for successful accident prevention and response. A lack of understanding of disaster safety lexicon leads lack of communication and information sharing, which can be a problem in communicating with appropriate responses in case of a disaster. Currently disaster and safety control agencies produce and manage heterogeneous information and they also develop and use word dictionaries individually. To solve this problem, identifying differences of disaster safety lexicon patterns by the user are essential for standardization. In this paper, we conducted lexicon patterns analysis based on social media and revealed the characteristics according to pattern types. At the result, we proposed the standardization and construction methods of disaster safety word dictionary.

An Efficient Korean Morpheme Analyzer and Synthesizer using Dictionary Information and Chart Data Structure (사전 정보와 차트 자료 구조를 이용한 효율적인 형태소 분석기 및 합성기(KoMAS))

  • 김정해;이상조
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.3
    • /
    • pp.123-131
    • /
    • 1994
  • This paper describes on the analysis of morphemes and it's synthesis being constituted of Korean word phrases. To analyze morphemes, we propose the introduction of "morph" for morpheme features in lexicon and the usage of chart data structures. it controls over the generation of unnecessary morpheme, and extracts every possible morpheme unit in a word phrase which minimized lexicon investigation by using heuristic information. Moreover, to synthesize morphemes, it is composed of every possible analyzed morphemes in word phrases to take advantage of speech and union information which can be obtained for program. Therefore, the systhesis of analyzed morphemes were designed to aid a syntactic analysis next step of natural language processing. This system for analyzing and systhesizing morpheme was to generate a word phrase by unifying syntactic and semantic features of analyzed morphemes in lexicon, and then established by C language of the personal computer.

  • PDF

Classification of Behavioral Lexicon and Definition of Upper, Lower Body Structures in Animation Character

  • Hongsik Pak;Suhyeon Choi;Taegu Lee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.103-117
    • /
    • 2023
  • This study focuses on the behavioural lexical classification for extracting animation character actions and the analysis of the character's upper and lower body movements. The behaviour and state of characters in the animation industry are crucial, and digital technology is enhancing the industry's value. However, research on animation motion application technology and behavioural lexical classification is still lacking. Therefore, this study aims to classify the predicates enabling animation motion, differentiate the upper and lower body movements of characters, and apply the behavioural lexicon's motion data. The necessity of this research lies in the potential contributions of advanced character motion technology to various industrial fields, and the use of the behavioural lexicon to elucidate and repurpose character motion. The research method applies a grammatical, behavioural, and semantic predicate classification and behavioural motion analysis based on the character's upper and lower body movements.