• Title/Summary/Keyword: Sentiment Polarity

Search Result 72, Processing Time 0.027 seconds

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

Detection of Complaints of Non-Face-to-Face Work before and during COVID-19 by Using Topic Modeling and Sentiment Analysis (동적 토픽 모델링과 감성 분석을 이용한 COVID-19 구간별 비대면 근무 부정요인 검출에 관한 연구)

  • Lee, Sun Min;Chun, Se Jin;Park, Sang Un;Lee, Tae Wook;Kim, Woo Ju
    • The Journal of Information Systems
    • /
    • v.30 no.4
    • /
    • pp.277-301
    • /
    • 2021
  • Purpose The purpose of this study is to analyze the sentiment responses of the general public to non-face-to-face work using text mining methodology. As the number of non-face-to-face complaints is increasing over time, it is difficult to review and analyze in traditional methods such as surveys, and there is a limit to reflect real-time issues. Approach This study has proposed a method of the research model, first by collecting and cleansing the data related to non-face-to-face work among tweets posted on Twitter. Second, topics and keywords are extracted from tweets using LDA(Latent Dirichlet Allocation), a topic modeling technique, and changes for each section are analyzed through DTM(Dynamic Topic Modeling). Third, the complaints of non-face-to-face work are analyzed through the classification of positive and negative polarity in the COVID-19 section. Findings As a result of analyzing 1.54 million tweets related to non-face-to-face work, the number of IDs using non-face-to-face work-related words increased 7.2 times and the number of tweets increased 4.8 times after COVID-19. The top frequently used words related to non-face-to-face work appeared in the order of remote jobs, cybersecurity, technical jobs, productivity, and software. The words that have increased after the COVID-19 were concerned about lockdown and dismissal, and business transformation and also mentioned as to secure business continuity and virtual workplace. New Normal was newly mentioned as a new standard. Negative opinions found to be increased in the early stages of COVID-19 from 34% to 43%, and then stabilized again to 36% through non-face-to-face work sentiment analysis. The complaints were, policies such as strengthening cybersecurity, activating communication to improve work productivity, and diversifying work spaces.

Social Media Analytics to Understand the Construction Industry Sentiments

  • Shrestha, K. Joseph;Mani, Nirajan;Kisi, Krishna P.;Abdelaty, Ahmed
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.712-720
    • /
    • 2022
  • The use of social media to disseminate news and interact with project stakeholders is increasing over time in the construction industry. Such social media data can be analyzed to get useful insights of the industry such as demands of new housing construction and satisfaction of construction workers. However, there has been a limited attempts to analyze social media data related to the construction industry. The objective of this study is to collect and analyze construction related tweets to understand the overall sentiments of individuals and organizations about the construction industry. The study collected 87,244 tweets from April 6, 2020, to April 13, 2020, which had hashtags relevant to the construction industry. The tweets were then analyzed to evaluate its sentiments polarity (positive or negative) and sentiment intensity or scores (-1 to +1). Descriptive statistics were produced for the tweets and the sentiment scores were visualized in a scatterplot to show the trend of the sentiment scores over time. The results shows that the overall sentiment score of all the tweets was slightly positive (0.0365). Negative tweets were retweeted and marked as favorite by more users on average than the positive ones. More specifically, the tweets with negative sentiments were retweeted by 2,802 users on average compared to the tweets with positive sentiments (247 average retweet count). This study can potentially be expanded in the future to produce a real time indicator of the construction market industry such as the increased availability of construction jobs, improved wage rates, and recession.

  • PDF

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.

Sentiment Analysis of COVID-19 Vaccination in Saudi Arabia

  • Sawsan Alowa;Lama Alzahrani;Noura Alhakbani;Hend Alrasheed
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.2
    • /
    • pp.13-30
    • /
    • 2023
  • Since the COVID-19 vaccine became available, people have been sharing their opinions on social media about getting vaccinated, causing discussions of the vaccine to trend on Twitter alongside certain events, making the website a rich data source. This paper explores people's perceptions regarding the COVID-19 vaccine during certain events and how these events influenced public opinion about the vaccine. The data consisted of tweets sent during seven important events that were gathered within 14 days of the first announcement of each event. These data represent people's reactions to these events without including irrelevant tweets. The study targeted tweets sent in Arabic from users located in Saudi Arabia. The data were classified as positive, negative, or neutral in tone. Four classifiers were used-support vector machine (SVM), naïve Bayes (NB), logistic regression (LOGR), and random forest (RF)-in addition to a deep learning model using BiLSTM. The results showed that the SVM achieved the highest accuracy, at 91%. Overall perceptions about the COVID-19 vaccine were 54% negative, 36% neutral, and 10% positive.

A Method of Analyzing Sentiment Polarity of Multilingual Social Media: A Case of Korean-Chinese Languages (다국어 소셜미디어에 대한 감성분석 방법 개발: 한국어-중국어를 중심으로)

  • Cui, Meina;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.91-111
    • /
    • 2016
  • It is crucial for the social media based marketing practices to perform sentiment analyze the unstructured data written by the potential consumers of their products and services. In particular, when it comes to the companies which are interested in global business, the companies must collect and analyze the data from the social media of multinational settings (e.g. Youtube, Instagram, etc.). In this case, since the texts are multilingual, they usually translate the sentences into a certain target language before conducting sentiment analysis. However, due to the lack of cultural differences and highly qualified data dictionary, translated sentences suffer from misunderstanding the true meaning. These result in decreasing the quality of sentiment analysis. Hence, this study aims to propose a method to perform a multilingual sentiment analysis, focusing on Korean-Chinese cases, while avoiding language translations. To show the feasibility of the idea proposed in this paper, we compare the performance of the proposed method with those of the legacy methods which adopt language translators. The results suggest that our method outperforms in terms of RMSE, and can be applied by the global business institutions.

Is Political Polarization Reinforced in the Online World?: Empirical Findings of Comments about News Articles (온라인 공간의 정치 양극화는 심화될 것인가?: 선거 기사 댓글에 대한 경험적 분석)

  • Eom, Ki-Hong;Kim, Dae-Sik
    • Informatization Policy
    • /
    • v.28 no.4
    • /
    • pp.19-35
    • /
    • 2021
  • The purpose of this research is to investigate the attributes of the online world and to analyze their influence on democracy. The research focuses on the mayoral by-elections that were held in Seoul and Busan, South Korea, on April 4, 2021. The study demonstrates the characteristics of online spaces and the polarization of the online public through news articles and user comments from the Internet. The research includes topic modeling to measure the diversity of media reports, sentiment analysis to measure online public opinion, and interrupted time series analysis to understand how a particular event influences online attitudes. A combination of these methods is used to attempt to estimate the strength of political polarity in the online environment. The study shows diverse media reports by election region and candidate, where the online public repeatedly reveals high negative and low positive attitudes towards each candidate. Moreover, political polarity can differ based on the level of interest in an election. Although voters pay less attention to a by-election than a presidential election, there is a solid political polarity in the online world. Hence, the research recommends preparing measures to alleviate the polarization as politics requires significant online participation.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Product Evaluation Summarization Through Linguistic Analysis of Product Reviews (상품평의 언어적 분석을 통한 상품 평가 요약 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.17B no.1
    • /
    • pp.93-98
    • /
    • 2010
  • In this paper, we introduce a system that summarizes product evaluation through linguistic analysis to effectively utilize explosively increasing product reviews. Our system analyzes polarities of product reviews by product features, based on which customers evaluate each product like 'design' and 'material' for a skirt product category. The system shows to customers a graph as a review summary that represents percentages of positive and negative reviews. We build an opinion word dictionary for each product feature through context based automatic expansion with small seed words, and judge polarity of reviews by product features with the extracted dictionary. In experiment using product reviews from online shopping malls, our system shows average accuracy of 69.8% in extracting judgemental word dictionary and 81.8% in polarity resolution for each sentence.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.