• Title/Summary/Keyword: news sentiment analysis

Search Result 84, Processing Time 0.024 seconds

Analysis of Social Trends for Electric Scooters Using Dynamic Topic Modeling and Sentiment Analysis (동적 토픽 모델링과 감성 분석을 활용한 전동킥보드에 대한 사회적 동향 분석)

  • Kyoungok, Kim;Yerang, Shin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.1
    • /
    • pp.19-30
    • /
    • 2023
  • An electric scooter(e-scooter), one popularized micro-mobility vehicle has shown rapidly increasing use in many cities. In South Korea, the use of e-scooters has greatly increased, as some companies have launched e-scooter sharing services in a few large cities, starting with Seoul in 2018. However, the use of e-scooters is still controversial because of issues such as parking and safety. Since the perception toward the means of transportation affects the mode choice, it is necessary to track the trends for electric scooters to make the use of e-scooters more active. Hence, this study aimed to analyze the trends related to e-scooters. For this purpose, we analyzed news articles related to e-scooters published from 2014 to 2020 using dynamic topic modeling to extract issues and sentiment analysis to investigate how the degree of positive and negative opinions in news articles had changed. As a result of topic modeling, it was possible to extract three different topics related to micro-mobility technologies, shared e-scooter services, and regulations for micro-mobility, and the proportion of the topic for regulations for micro-mobility increased as shared e-scooter services increased in recent years. In addition, the top positive words included quick, enjoyable, and easy, whereas the top negative words included threat, complaint, and ilegal, which implies that people satisfied with the convenience of e-scooter or e-scooter sharing services, but safety and parking issues should be addressed for micro-mobility services to become more active. In conclusion, this study was able to understand how issues and social trends related to e-scooters have changed, and to determine the issues that need to be addressed. Moreover, it is expected that the research framework using dynamic topic modeling and sentiment analysis will be helpful in determining social trends on various areas.

Analysis of Regional Fertility Gap Factors Using Explainable Artificial Intelligence (설명 가능한 인공지능을 이용한 지역별 출산율 차이 요인 분석)

  • Dongwoo Lee;Mi Kyung Kim;Jungyoon Yoon;Dongwon Ryu;Jae Wook Song
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.1
    • /
    • pp.41-50
    • /
    • 2024
  • Korea is facing a significant problem with historically low fertility rates, which is becoming a major social issue affecting the economy, labor force, and national security. This study analyzes the factors contributing to the regional gap in fertility rates and derives policy implications. The government and local authorities are implementing a range of policies to address the issue of low fertility. To establish an effective strategy, it is essential to identify the primary factors that contribute to regional disparities. This study identifies these factors and explores policy implications through machine learning and explainable artificial intelligence. The study also examines the influence of media and public opinion on childbirth in Korea by incorporating news and online community sentiment, as well as sentiment fear indices, as independent variables. To establish the relationship between regional fertility rates and factors, the study employs four machine learning models: multiple linear regression, XGBoost, Random Forest, and Support Vector Regression. Support Vector Regression, XGBoost, and Random Forest significantly outperform linear regression, highlighting the importance of machine learning models in explaining non-linear relationships with numerous variables. A factor analysis using SHAP is then conducted. The unemployment rate, Regional Gross Domestic Product per Capita, Women's Participation in Economic Activities, Number of Crimes Committed, Average Age of First Marriage, and Private Education Expenses significantly impact regional fertility rates. However, the degree of impact of the factors affecting fertility may vary by region, suggesting the need for policies tailored to the characteristics of each region, not just an overall ranking of factors.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • v.1 no.1
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Fake News Detection based on Convolutional Neural Network and Sentiment Analysis (합성곱신경망과 감성분석 기반의 가짜뉴스 탐지)

  • Lee, Tae Won;Yang, Yeongwook;Park, Ji Su;Shon, Jin Gon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.64-67
    • /
    • 2021
  • 가짜뉴스는 뉴스 기사 형식을 갖는 날조된 정보를 의미하며, 최근 모바일 인터넷 장치의 보급과 소셜 네트워크 서비스의 대중화로 온라인 확산이 가속화되고 있다. 기존 연구는 가짜뉴스의 탐지를 위해 뉴스의 주제목, 부제목, 리드, 본문 등 뉴스 기사를 이루는 구성요소를 비롯하여 언론사, 기자, 날짜, 확산 경로 등의 메타 데이터를 대상으로 분석하였다. 그러나 뉴스의 제목과 본문 및 메타 데이터 등은 내용 수정이 쉬워, 다량의 데이터를 학습한 모델이라 하더라도 높은 정확도를 장기간 유지하기 어려울 수 있다. 이러한 문제점을 해결하기 위하여 본 논문은 합성곱 신경망을 이용해 문맥 정보를 분석하고 장단기 메모리 기반의 감성분석을 추가로 수행한다. 문맥 정보와 가짜뉴스 유포자가 쉽게 수정할 수 없는 감성 변화 패턴을 활용하여 성능이 개선된 가짜뉴스 탐지 모델을 제안한다.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

COVID19 Related Keyword Analysis: Based on Topic Modeling and Semantic Network Analysis (코로나19 관련 키워드 분석: 토픽 모델링과 의미 연결망 네트워크 분석을 중심으로)

  • Kim, Dong-wook;Lee, Min-sang;Jeong, Jae-young;Kim, Hyun-chul
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.2
    • /
    • pp.127-132
    • /
    • 2022
  • In the era of COVID-19 pandemic, COVID related keywords, news and SNS data are pouring out. With the help of the data and LDA topic modeling, we can check out what media reports about COVID-19 and vaccines. Also, we can be clear how the public reacts to the vaccine on social media and how this is related with the increasing number of COVID-19 patients. By using sentimental analysis methodology, we can get to know about the different kinds of reports that Korea media send out and get to know what kind of emotions that each media company uses in majority. Through this procedure, we can know the difference between the Korean media and the foreign ones. Ultimately, we can find and analyze the keyword that suddenly rose during the COVID-19 period throughout this research.

A Trend Analysis on E-sports using Social Big Data

  • Kyoung Ah YEO;Min Soo KIM
    • Journal of Sport and Applied Science
    • /
    • v.8 no.1
    • /
    • pp.11-17
    • /
    • 2024
  • Purpose: The purpose of the study was to understand a trend of esports in terms of gamers' and fans' perceptions toward esports using social big data. Research design, data, and methodology: In this study, researchers first selected keywords related to esports. Then a total of 10,138 buzz data created at twitter, Facebook, news media, blogs, café and community between November 10, 2022 and November 19, 2023 were collected and analyzed with 'Textom', a big data solution. Results: The results of this study were as follows. Firstly, the news data's main articles were about competitions hosted by local governments and policies to revitalize the gaming industry. Secondly, As a result of esports analysis using Textom, there was a lot of interest in the adoption of the Hangzhou Asian Games as an official event and various esports competitions. As a result of the sentiment analysis, the positive content was related to the development potential of the esports industry, and the negative content was a discussion about the fundamental problem of whether esports is truly a sport. Thirdly, As a result of analyzing social big data on esports and the Olympics, there was hope that it would be adopted as an official event in the Olympics due to its adoption as an official event in the Hangzhou Asian Games. Conclusions: There was a positive opinion that the adoption of esports as an official Olympic event had positive content that could improve the quality of the game, and a negative opinion that games with actions that violate the Olympic spirit, such as murder and assault, should not be adopted as an official Olympic event. Further implications were discussed.

Quality Analysis of the Request for Proposals of Public Information Systems Project : System Operational Concept (공공정보화사업 제안요청서 품질분석 : 시스템 운영 개념을 중심으로)

  • Park, Sanghwi;Kim, Byungcho
    • Journal of Information Technology Services
    • /
    • v.18 no.2
    • /
    • pp.37-54
    • /
    • 2019
  • The purpose of this study is to present an evaluation model to measure the clarification level of stakeholder requirements of public sector software projects in the Republic of Korea. We tried to grasp the quality of proposal request through evaluation model. It also examines the impact of the level of stakeholder requirements on the level of system requirements. To do this, we analyzed existing research models and related standards related to business requirements and stakeholder requirements, and constructed evaluation models for the system operation concept documents in the ISO/IEC/IEEE 29148. The system operation concept document is a document prepared by organizing the requirements of stakeholders in the organization and sharing the intention of the organization. The evaluation model proposed in this study focuses on evaluating whether the contents related to the system operation concept are faithfully written in the request for proposal. The evaluation items consisted of three items: 'organization status', 'desired changes', and 'operational constraints'. The sample extracted 217 RFPs in the national procurement system. As a result of the analysis, the evaluation model proved to be valid and the internal consistency was maintained. The level of system operation concept was very low, and it was also found to affect the quality of system requirements. It is more important to clearly write stakeholders' requirements than the functional requirements. we propose a news classification methods for sentiment analysis that is effective for bankruptcy prediction model.

Analysis of YouTube's role as a new platform between media and consumers

  • Hur, Tai-Sung;Im, Jung-ju;Song, Da-hye
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.53-60
    • /
    • 2022
  • YouTube realistically shows fake news and biased content based on facts that have not been verified due to low entry barriers and ambiguity in video regulation standards. Therefore, this study aims to analyze the influence of the media and YouTube on individual behavior and their relationship. Data from YouTube and Twitter are randomly imported with selenium, beautiful soup, and Twitter APIs to classify the 31 most frequently mentioned keywords. Based on 31 keywords classified, data were collected from YouTube, Twitter, and Naver News, and positive, negative, and neutral emotions were classified and quantified with NLTK's Natural Language Toolkit (NLTK) Vader model and used as analysis data. As a result of analyzing the correlation of data, it was confirmed that the higher the negative value of news, the more positive content on YouTube, and the positive index of YouTube content is proportional to the positive and negative values on Twitter. As a result of this study, YouTube is not consistent with the emotion index shown in the news due to its secondary processing and affected characteristics. In other words, processed YouTube content intuitively affects Twitter's positive and negative figures, which are channels of communication. The results of this study analyzed that YouTube plays a role in assisting individual discrimination in the current situation where accurate judgment of information has become difficult due to the emergence of yellow media that stimulates people's interests and instincts.