• Title/Summary/Keyword: Sentiment Lexicon

Search Result 53, Processing Time 0.025 seconds

A Comparative Analysis of Travelers' Online Reviews among China, USA, and South Korea using Sentiment Analysis in the Era of the COVID-19 Pandemic (코로나19 팬데믹 상황에서 감성분석을 이용한 미국, 중국, 한국 여행자의 온라인 리뷰 비교 분석)

  • Hong, Junwoo;Hong, Taeho
    • Journal of Information Technology Services
    • /
    • v.20 no.5
    • /
    • pp.159-176
    • /
    • 2021
  • In this study, we performed a comparative analysis of the sentiment value for the tourists in USA, China, and Korea on the COVID19 pandemic era to explore and find out the features of the tourists by using online reviews. We collected a total of 243,826 online hotel reviews for metropolitan city and vacation spot in the three countries to compare the features between the business and the vacation trips. We collected the online reviews into the tow groups from Jan. 1, 2019 to Nov. 31, 2019 for before COVID19 pandemic and from Apr. 1, 2020 to Deb 28, 2021 for during COVID19. Online reviews were categorized into 6 dimensions using LDA model. Sentiment analysis were presented for 6 dimensions by utilizing a lexicon base. We proposed an approach to analyzing the importance of each attribute by applying 6-dimensional sentiment values to conjoint analysis. Our empirical analysis showed that the proposed approach could explore and find out the changed features of travelers during the COVID19 pandemic.

Developing the Automated Sentiment Learning Algorithm to Build the Korean Sentiment Lexicon for Finance (재무분야 감성사전 구축을 위한 자동화된 감성학습 알고리즘 개발)

  • Su-Ji Cho;Ki-Kwang Lee;Cheol-Won Yang
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.32-41
    • /
    • 2023
  • Recently, many studies are being conducted to extract emotion from text and verify its information power in the field of finance, along with the recent development of big data analysis technology. A number of prior studies use pre-defined sentiment dictionaries or machine learning methods to extract sentiment from the financial documents. However, both methods have the disadvantage of being labor-intensive and subjective because it requires a manual sentiment learning process. In this study, we developed a financial sentiment dictionary that automatically extracts sentiment from the body text of analyst reports by using modified Bayes rule and verified the performance of the model through a binary classification model which predicts actual stock price movements. As a result of the prediction, it was found that the proposed financial dictionary from this research has about 4% better predictive power for actual stock price movements than the representative Loughran and McDonald's (2011) financial dictionary. The sentiment extraction method proposed in this study enables efficient and objective judgment because it automatically learns the sentiment of words using both the change in target price and the cumulative abnormal returns. In addition, the dictionary can be easily updated by re-calculating conditional probabilities. The results of this study are expected to be readily expandable and applicable not only to analyst reports, but also to financial field texts such as performance reports, IR reports, press articles, and social media.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Fusion Approach to Targeted Opinion Detection in Blogosphere (블로고스피어에서 주제에 관한 의견을 찾는 융합적 의견탐지방법)

  • Yang, Kiduk
    • Journal of Korean Library and Information Science Society
    • /
    • v.46 no.1
    • /
    • pp.321-344
    • /
    • 2015
  • This paper presents a fusion approach to sentiment detection that combines multiple sources of evidence to retrieve blogs that contain opinions on a specific topic. Our approach to finding opinionated blogs on topic consists of first applying traditional information retrieval methods to retrieve blogs on a given topic and then boosting the ranks of opinionated blogs based on the opinion scores computed by multiple sentiment detection methods. Our sentiment detection strategy, whose central idea is to rely on a variety of complementary evidences rather than trying to optimize the utilization of a single source of evidence, includes High Frequency module, which identifies opinions based on the frequency of opinion terms (i.e., terms that occur frequently in opinionated documents), Low Frequency module, which makes use of uncommon/rare terms (e.g., "sooo good") that express strong sentiments, IU Module, which leverages n-grams with IU (I and you) anchor terms (e.g., I believe, You will love), Wilson's lexicon module, which uses a collection-independent opinion lexicon constructed from Wilson's subjectivity terms, and Opinion Acronym module, which utilizes a small set of opinion acronyms (e.g., imho). The results of our study show that combining multiple sources of opinion evidence is an effective method for improving opinion detection performance.

Anatomy of Sentiment Analysis of Tweets Using Machine Learning Approach

  • Misbah Iram;Saif Ur Rehman;Shafaq Shahid;Sayeda Ambreen Mehmood
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.97-106
    • /
    • 2023
  • Sentiment analysis using social network platforms such as Twitter has achieved tremendous results. Twitter is an online social networking site that contains a rich amount of data. The platform is known as an information channel corresponding to different sites and categories. Tweets are most often publicly accessible with very few limitations and security options available. Twitter also has powerful tools to enhance the utility of Twitter and a powerful search system to make publicly accessible the recently posted tweets by keyword. As popular social media, Twitter has the potential for interconnectivity of information, reviews, updates, and all of which is important to engage the targeted population. In this work, numerous methods that perform a classification of tweet sentiment in Twitter is discussed. There has been a lot of work in the field of sentiment analysis of Twitter data. This study provides a comprehensive analysis of the most standard and widely applicable techniques for opinion mining that are based on machine learning and lexicon-based along with their metrics. The proposed work is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous, and polarized positive, negative or neutral. In order to validate the performance of the proposed framework, an extensive series of experiments has been performed on the real world twitter dataset that alter to show the effectiveness of the proposed framework. This research effort also highlighted the recent challenges in the field of sentiment analysis along with the future scope of the proposed work.

A Method of Constructing Large-Scale Train Set Based on Sentiment Lexicon for Improving the Accuracy of Deep Learning Model (딥러닝 모델의 정확도 향상을 위한 감성사전 기반 대용량 학습데이터 구축 방안)

  • Choi, Min-Seong;Park, Sang-Min;On, Byung-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.106-111
    • /
    • 2018
  • 감성분석(Sentiment Analysis)은 텍스트에 나타난 감성을 분석하는 기술로 자연어 처리 분야 중 하나이다. 한국어 텍스트를 감성분석하기 위해 다양한 기계학습 기법이 많이 연구되어 왔으며 최근 딥러닝의 발달로 딥러닝 기법을 이용한 감성분석도 활발해지고 있다. 딥러닝을 이용해 감성분석을 수행할 경우 좋은 성능을 얻기 위해서는 충분한 양의 학습데이터가 필요하다. 하지만 감성분석에 적합한 학습데이터를 얻는 것은 쉽지 않다. 본 논문에서는 이와 같은 문제를 해결하기 위해 기존에 구축되어 있는 감성사전을 활용한 대용량 학습데이터 구축 방안을 제안한다.

  • PDF

Opinion Bias Detection Based on Social Opinions for Twitter

  • Kwon, A-Rong;Lee, Kyung-Soon
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.538-547
    • /
    • 2013
  • In this paper, we propose a bias detection method that is based on personal and social opinions that express contrasting views on competing topics on Twitter. We used unsupervised polarity classification is conducted for learning social opinions on targets. The $tf{\cdot}idf$ algorithm is applied to extract targets to reflect sentiments and features of tweets. Our method addresses there being a lack of a sentiment lexicon when learning social opinions. To evaluate the effectiveness of our method, experiments were conducted on four issues using Twitter test collection. The proposed method achieved significant improvements over the baselines.

Cyberbullying Detection by Sentiment Analysis of Tweets' Contents Written in Arabic in Saudi Arabia Society

  • Almutairi, Amjad Rasmi;Al-Hagery, Muhammad Abdullah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.112-119
    • /
    • 2021
  • Social media has become a global means of communication in people's lives. Most people are using Twitter for communication purposes and its inappropriate use, which has negative effects on people's lives. One of the widely common misuses of Twitter is cyberbullying. As the resources of dialectal Arabic are rare, so for cyberbullying most people are using dialectal Arabic. For this reason, the ultimate goal of this study is to detect and classify cyberbullying on Twitter in the Arabic context in Saudi Arabia. To help in the detection and classification of tweets, Pointwise Mutual Information (PMI) to generate a lexicon, and Support Vector Machine (SVM) algorithms are used. The evaluation is performed on both methods in terms of the F1-score. However, the F1-score after applying the PMI is 50%, while after the SVM application on the resampling data it is 82%. The analysis of the results shows that the SVM algorithm outperforms better.

An Automatic Expansion of Sentiment Lexicon by Using Corpus (코퍼스를 이용한 감성 사전 자동 확장)

  • Lee, Kong Joo;Seo, Hyung-Won;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.158-161
    • /
    • 2010
  • 본 연구에서는 기본 감성 사전과 대량의 코퍼스를 이용하여 대상 코퍼스에서 사용하는 확장된 감성 표현을 자동으로 추출하는 방법을 제안한다. 대상 코퍼스로는 방송사들이 운영하는 시청자 게시판의 게시글을 대상으로 하였다. 이와 같은 방법으로 대상 코퍼스에서 사용하는 구체적인 감성 패턴들을 추출할 수 있었다.

  • PDF

Understanding the Sentiment on Gig Economy: Good or Bad?

  • NORAZMI, Fatin Aimi Naemah;MAZLAN, Nur Syazwani;SAID, Rusmawati;OK RAHMAT, Rahmita Wirza
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.9 no.10
    • /
    • pp.189-200
    • /
    • 2022
  • The gig economy offers many advantages, such as flexibility, variety, independence, and lower cost. However, there are also safety concerns, lack of regulations, uncertainty, and unsatisfactory services, causing people to voice their opinion on social media. This paper aims to explore the sentiments of consumers concerning gig economy services (Grab, Foodpanda and Airbnb) through the analysis of social media. First, Vader Lexicon was used to classify the comments into positive, negative, and neutral sentiments. Then, the comments were further classified into three machine learning algorithms: Support Vector Machine, Light Gradient Boosted Machine, and Logistic Regression. Results suggested that gig economy services in Malaysia received more positive sentiments (52%) than negative sentiments (19%) and neutral sentiments (29%). Based on the three algorithms used in this research, LGBM has been the best model with the highest accuracy of 85%, while SVM has 84% and LR 82%. The results of this study proved the power of text mining and sentiment analysis in extracting business value and providing insight to businesses. Additionally, it aids gig managers and service providers in understanding clients' sentiments about their goods and services and making necessary adjustments to optimize satisfaction.