• Title/Summary/Keyword: Text sentiment analysis

Search Result 241, Processing Time 0.035 seconds

Analyzing Vocabulary Characteristics of Colloquial Style Corpus and Automatic Construction of Sentiment Lexicon (구어체 말뭉치의 어휘 사용 특징 분석 및 감정 어휘 사전의 자동 구축)

  • Kang, Seung-Shik;Won, HyeJin;Lee, Minhaeng
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.144-151
    • /
    • 2020
  • In a mobile environment, communication takes place via SMS text messages. Vocabularies used in SMS texts can be expected to use vocabularies of different classes from those used in general Korean literary style sentence. For example, in the case of a typical literary style, the sentence is correctly initiated or terminated and the sentence is well constructed, while SMS text corpus often replaces the component with an omission and a brief representation. To analyze these vocabulary usage characteristics, the existing colloquial style corpus and the literary style corpus are used. The experiment compares and analyzes the vocabulary use characteristics of the colloquial corpus SMS text corpus and the Naver Sentiment Movie Corpus, and the written Korean written corpus. For the comparison and analysis of vocabulary for each corpus, the part of speech tag adjective (VA) was used as a standard, and a distinctive collexeme analysis method was used to measure collostructural strength. As a result, it was confirmed that adjectives related to emotional expression such as'good-','sorry-', and'joy-' were preferred in the SMS text corpus, while adjectives related to evaluation expressions were preferred in the Naver Sentiment Movie Corpus. The word embedding was used to automatically construct a sentiment lexicon based on the extracted adjectives with high collostructural strength, and a total of 343,603 sentiment representations were automatically built.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Topic and Sentiment Analysis on COVID19 Research in Korea Using Text Analysis (텍스트 분석을 이용한 코로나19 관련 국내논문의 토픽 및 감성연구)

  • Heo, Seong-Min;Yang, Ji-Yeon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.329-331
    • /
    • 2021
  • 본 연구에서는 코로나19 관련 연구논문의 연구주제를 탐색하고 동향을 검토하고 있다. 또한 감성분석을 통해 부정적인 어조가 강한 경고가 되는 주제들을 알아본다. 잠재 디리슐레 할당(LDA)를 이용하여 총 8개의 토픽을 발견하 였고, 이를 구조적 토픽 모델링(STM)과 비교하여 비교적 안정적인 결과임을 확인하였다. 또한 k-means 군집 알고리즘을 통해 각 토픽별로 세부 연구주제를 발견하였고 주성분 분석을 이용하여 이를 시각적으로 표현하였다. 감성분석을 통해 각 토픽별 긍정적, 부정적인 단어들을 살펴보고 감성점수를 계산하여 연구논문의 주된 어조를 파악하였는데, 특히 생물 의학 관련, 국제적 역학관계, 심리적 영향과 관련된 연구에서 부정적인 어조가 강한 것으로 나타나 해당 부문에 대해서 주의와 관심이 요구된다. 향후 연구자들이 연구의 방향성을 탐색하고 정책결정자들이 연구지원 사업을 결정하는데 기초자료로 활용될 수 있을 것이다.

  • PDF

Research on Sentiment Analysis in Social Media App Reviews: Focusing on Instagram (소셜 미디어 앱 리뷰에서의 감성 분석 연구: 인스타그램 중심으로)

  • Wen-Qi Li;Yu-Hang Wu
    • Science of Emotion and Sensibility
    • /
    • v.27 no.1
    • /
    • pp.69-80
    • /
    • 2024
  • This study aimed to gain valuable insights into the performance and user satisfaction of applications (apps) through a thorough analysis of Instagram user reviews collected from Google Play. The study utilized text mining and sentiment analysis techniques and systematically identified emotions and opinions embedded in user reviews to deeply understand the areas of improvement and user experiences of the app. It analyzes how Instagram reviews reflect the diverse experiences of users and how they reveal the strengths and weaknesses of the app. For this purpose, sentiment analysis using the naive Bayes algorithm was conducted, and the results were expected to aid in the improvement of Instagram's services. In addition, the study aimed to assist developers in better understanding and utilizing user feedback, ultimately contributing to enhanced user satisfaction. This study explored the complex relationship between social media usage patterns and user opinions by seeking ways to provide a better user experience through these insights.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Quantification Analysis of Soft Power through Sentiment Analysis (감성분석을 통한 소프트 파워의 수치화 분석)

  • An-Min;Bong-Hyun Kim
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.1-7
    • /
    • 2024
  • This paper deals with the topic of quantification of soft power through emotional analysis. Sentiment analysis refers to the process of detecting and analyzing emotions or emotions in various data such as text, voice, and images. Therefore, in this paper, we explored the methodology and significance of how soft power can be quantified through emotional analysis. Soft power refers to the ability of a country or organization to influence the behavior of another country or organization in a desired direction. It is built by soft factors such as culture, values, and political system rather than military or economic means. Additionally, sentiment analysis is being used as a useful tool to measure and understand these soft areas.

Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)

  • Jeon, Min Jin;Hwang, Ji Won;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-22
    • /
    • 2021
  • Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.

Frequency Matrix Based Summaries of Negative and Positive Reviews

  • Almuhannad Sulaiman Alorfi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.101-109
    • /
    • 2023
  • This paper discusses the use of sentiment analysis and text summarization techniques to extract valuable information from the large volume of user-generated content such as reviews, comments, and feedback on online platforms and social media. The paper highlights the effectiveness of sentiment analysis in identifying positive and negative reviews and the importance of summarizing such text to facilitate comprehension and convey essential findings to readers. The proposed work focuses on summarizing all positive and negative reviews to enhance product quality, and the performance of the generated summaries is measured using ROUGE scores. The results show promising outcomes for the developed methods in summarizing user-generated content.

Global Text & Local Text Integration Method for Aspect-Based Sentiment Analysis (개체단위 감정분석을 위한 글로벌 텍스트&로컬 텍스트 통합 방법)

  • Lin, Te;Joe, Inwhee
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.414-416
    • /
    • 2022
  • 개체단위 감정분석(Aspect-Based Sentiment Analysis)는 자연어 처리에서 중요한 연구분야이다. 이는 입력 문장중에 존재하는 aspect term 의 감정 극성을 분석하는 것이 목적이다. 이 분야에서 현재 많이 사용되는 모델은 대부분 로컬 텍스트 또는 로컬 덱스트와 aspect term 사이의 관계에 주목하고 있다. 로켈 텍스트에 비해 글로벌 텍스트는 로컬 텍스트 뒤에 aspect term 내용을 추가해서 문장중에 있는 aspect term 내용을 더 깊게 학습할 수 있다고 생각한다. 본 논문에서는 새로운 masked attention 메커니즘을 사용하고 attention 메커니즘의 입력으로 글로벌 텍스트중에 있는 로컬 텍스트를 가로채어 전체 글로벌 텍스트의 내용과 융합한다. 이 방법은 semeval2014 데이터 셋에서 매우 좋은 결과를 얻었다.

Paying Back to Good Deeds: A Text Mining Approach to Explore Don-jjul as Pro-consumption Behavior

  • Hojin Choo;Sue Hyun Lee
    • Asia Marketing Journal
    • /
    • v.26 no.2
    • /
    • pp.104-128
    • /
    • 2024
  • More consumers are choosing pro-consumption for social change, but scholars know little about why and how consumers engage in pro-consumption behaviors. A newly emerged pro-consumption behavior called "Don-jjul," which appeared during the COVID-19 pandemic in South Korea, refers to compensating businesses that have engaged in altruistic actions by boosting their sales. This study used Latent Dirichlet Allocation (LDA) of topic modeling, sentiment analysis, and in-depth interviews to investigate the perceptions, motivations, and emotions regarding Don-jjul. As a result, the study revealed pro-consumers' perceptions of Don-jjul as "collective pro-consumption for contributing to social well-being." Don-jjul has two main motives: "supporting underdogs with difficulties" and "compensating good businesses economically." We also found two ambivalent emotions evoked by Don-jjul: "respect for good business owners" and "concerns regarding the misuse of Don-jjul." The results contribute to pro-consumption research for social well-being, providing business opportunities for retailers and CSR managers with a deep understanding of pro-consumers.