• Title/Summary/Keyword: Text sentiment analysis

Search Result 241, Processing Time 0.034 seconds

Social Issue Analysis Based on Sentiment of Twitter Users (트위터 사용자들의 감성을 이용한 사회적 이슈 분석)

  • Kim, Hannah;Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.11
    • /
    • pp.81-91
    • /
    • 2019
  • Recently, social network service (SNS) is actively used by public. Among them, Twitter has a lot of tweets including sentiment and it is convenient to collect data through open Aplication Programming Interface (API). In this paper, we analyze social issues and suggest the possibility of using them in marketing through sentimental information of users. In this paper, we collect twitter text about social issues and classify as positive or negative by sentiment classifier to provide qualitative analysis. We provide a quantitative analysis by analyzing the correlation between the number of like and retweet of each tweet. As a result of the qualitative analysis, we suggest solutions to attract the interest of the public or consumers. As a result of the quantitative analysis, we conclude that the positive tweet should be brief to attract the users' attention on the Twitter. As future work, we will continue to analyze various social issues.

Fine-tuning BERT-based NLP Models for Sentiment Analysis of Korean Reviews: Optimizing the sequence length (BERT 기반 자연어처리 모델의 미세 조정을 통한 한국어 리뷰 감성 분석: 입력 시퀀스 길이 최적화)

  • Sunga Hwang;Seyeon Park;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.4
    • /
    • pp.47-56
    • /
    • 2024
  • This paper proposes a method for fine-tuning BERT-based natural language processing models to perform sentiment analysis on Korean review data. By varying the input sequence length during this process and comparing the performance, we aim to explore the optimal performance according to the input sequence length. For this purpose, text review data collected from the clothing shopping platform M was utilized. Through web scraping, review data was collected. During the data preprocessing stage, positive and negative satisfaction scores were recalibrated to improve the accuracy of the analysis. Specifically, the GPT-4 API was used to reset the labels to reflect the actual sentiment of the review texts, and data imbalance issues were addressed by adjusting the data to 6:4 ratio. The reviews on the clothing shopping platform averaged about 12 tokens in length, and to provide the optimal model suitable for this, five BERT-based pre-trained models were used in the modeling stage, focusing on input sequence length and memory usage for performance comparison. The experimental results indicated that an input sequence length of 64 generally exhibited the most appropriate performance and memory usage. In particular, the KcELECTRA model showed optimal performance and memory usage at an input sequence length of 64, achieving higher than 92% accuracy and reliability in sentiment analysis of Korean review data. Furthermore, by utilizing BERTopic, we provide a Korean review sentiment analysis process that classifies new incoming review data by category and extracts sentiment scores for each category using the final constructed model.

Text Mining Driven Content Analysis of Social Perception on Schizophrenia Before and After the Revision of the Terminology (조현병과 정신분열병에 대한 뉴스 프레임 분석을 통해 본 사회적 인식의 변화)

  • Kim, Hyunji;Park, Seojeong;Song, Chaemin;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.4
    • /
    • pp.285-307
    • /
    • 2019
  • In 2011, the Korean Medical Association revised the name of schizophrenia to remove the social stigma for the sick. Although it has been about nine years since the revision of the terminology, no studies have quantitatively analyzed how much social awareness has changed. Thus, this study investigates the changes in social awareness of schizophrenia caused by the revision of the disease name by analyzing Naver news articles related to the disease. For text analysis, LDA topic modeling, TF-IDF, word co-occurrence, and sentiment analysis techniques were used. The results showed that social awareness of the disease was more negative after the revision of the terminology. In addition, social awareness of the former term among two terms used after the revision was more negative. In other words, the revision of the disease did not resolve the stigma.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

A domain-specific sentiment lexicon construction method for stock index directionality (주가지수 방향성 예측을 위한 도메인 맞춤형 감성사전 구축방안)

  • Kim, Jae-Bong;Kim, Hyoung-Joong
    • Journal of Digital Contents Society
    • /
    • v.18 no.3
    • /
    • pp.585-592
    • /
    • 2017
  • As development of personal devices have made everyday use of internet much easier than before, it is getting generalized to find information and share it through the social media. In particular, communities specialized in each field have become so powerful that they can significantly influence our society. Finally, businesses and governments pay attentions to reflecting their opinions in their strategies. The stock market fluctuates with various factors of society. In order to consider social trends, many studies have tried making use of bigdata analysis on stock market researches as well as traditional approaches using buzz amount. In the example at the top, the studies using text data such as newspaper articles are being published. In this paper, we analyzed the post of 'Paxnet', a securities specialists' site, to supplement the limitation of the news. Based on this, we help researchers analyze the sentiment of investors by generating a domain-specific sentiment lexicon for the stock market.

User Experience Evaluation of Menstrual Cycle Measurement Application Using Text Mining Analysis Techniques (텍스트 마이닝 분석 기법을 활용한 월경주기측정 애플리케이션 사용자 경험 평가)

  • Wookyung Jeong;Donghee Shin
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.1-31
    • /
    • 2023
  • This study conducted user experience evaluation by introducing various text mining techniques along with topic modeling techniques for mobile menstrual cycle measurement applications that are closely related to women's health and analyzed the results by combining them with a honeycomb model. To evaluate the user experience revealed in the menstrual cycle measurement application review, 47,117 Korean reviews of the menstrual cycle measurement application were collected. Topic modeling analysis was conducted to confirm the overall discourse on the user experience revealed in the review, and text network analysis was conducted to confirm the specific experience of each topic. In addition, sentimental analysis was conducted to understand the emotional experience of users. Based on this, the development strategy of the menstrual cycle measurement application was presented in terms of accuracy, design, monitoring, data management, and user management. As a result of the study, it was confirmed that the accuracy and monitoring function of the menstrual cycle measurement of the application should be improved, and it was observed that various design attempts were required. In addition, the necessity of supplementing personal information and the user's biometric data management method was also confirmed. By exploring the user experience (UX) of the menstrual cycle measurement application in-depth, this study revealed various factors experienced by users and suggested practical improvements to provide a better experience. It is also significant in that it presents a methodology by combines topic modeling and text network analysis techniques so that researchers can closely grasp vast amounts of review data in the process of evaluating user experiences.

Exploring an Optimal Feature Selection Method for Effective Opinion Mining Tasks

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.2
    • /
    • pp.171-177
    • /
    • 2019
  • This paper aims to find the most effective feature selection method for the sake of opinion mining tasks. Basically, opinion mining tasks belong to sentiment analysis, which is to categorize opinions of the online texts into positive and negative from a text mining point of view. By using the five product groups dataset such as apparel, books, DVDs, electronics, and kitchen, TF-IDF and Bag-of-Words(BOW) fare calculated to form the product review feature sets. Next, we applied the feature selection methods to see which method reveals most robust results. The results show that the stacking classifier based on those features out of applying Information Gain feature selection method yields best result.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

A Crowdsourcing-based Emotional Words Tagging Game for Building a Polarity Lexicon in Korean (한국어 극성 사전 구축을 위한 크라우드소싱 기반 감성 단어 극성 태깅 게임)

  • Kim, Jun-Gi;Kang, Shin-Jin;Bae, Byung-Chull
    • Journal of Korea Game Society
    • /
    • v.17 no.2
    • /
    • pp.135-144
    • /
    • 2017
  • Sentiment analysis refers to a way of analyzing the writer's subjective opinions or feelings through text. For effective sentiment analysis, it is essential to build emotional word polarity lexicon. This paper introduces a crowdsourcing-based game that we have developed for efficiently building a polarity lexicon in Korean. First, we collected a corpus from the relating Internet communities using a crawler, and we classified them into words using the Twitter POS analyzer. These POS-tagged words are provided as a form of mobile platform based tagging game in which the players voluntarily tagged the polarities of the words, and then the result was collected into the database. So far we have tagged the polarities of about 1200 words. We expect that our research can contribute to the Korean sentiment analysis research especially in the game domain by collecting more emotional word data in the future.

Detection of Complaints of Non-Face-to-Face Work before and during COVID-19 by Using Topic Modeling and Sentiment Analysis (동적 토픽 모델링과 감성 분석을 이용한 COVID-19 구간별 비대면 근무 부정요인 검출에 관한 연구)

  • Lee, Sun Min;Chun, Se Jin;Park, Sang Un;Lee, Tae Wook;Kim, Woo Ju
    • The Journal of Information Systems
    • /
    • v.30 no.4
    • /
    • pp.277-301
    • /
    • 2021
  • Purpose The purpose of this study is to analyze the sentiment responses of the general public to non-face-to-face work using text mining methodology. As the number of non-face-to-face complaints is increasing over time, it is difficult to review and analyze in traditional methods such as surveys, and there is a limit to reflect real-time issues. Approach This study has proposed a method of the research model, first by collecting and cleansing the data related to non-face-to-face work among tweets posted on Twitter. Second, topics and keywords are extracted from tweets using LDA(Latent Dirichlet Allocation), a topic modeling technique, and changes for each section are analyzed through DTM(Dynamic Topic Modeling). Third, the complaints of non-face-to-face work are analyzed through the classification of positive and negative polarity in the COVID-19 section. Findings As a result of analyzing 1.54 million tweets related to non-face-to-face work, the number of IDs using non-face-to-face work-related words increased 7.2 times and the number of tweets increased 4.8 times after COVID-19. The top frequently used words related to non-face-to-face work appeared in the order of remote jobs, cybersecurity, technical jobs, productivity, and software. The words that have increased after the COVID-19 were concerned about lockdown and dismissal, and business transformation and also mentioned as to secure business continuity and virtual workplace. New Normal was newly mentioned as a new standard. Negative opinions found to be increased in the early stages of COVID-19 from 34% to 43%, and then stabilized again to 36% through non-face-to-face work sentiment analysis. The complaints were, policies such as strengthening cybersecurity, activating communication to improve work productivity, and diversifying work spaces.