• Title/Summary/Keyword: news sentiment analysis

Search Result 84, Processing Time 0.031 seconds

Analysis of articles on water quality accidents in the water distribution networks using big data topic modelling and sentiment analysis (빅데이터 토픽모델링과 감성분석을 활용한 물공급과정에서의 수질사고 기사 분석)

  • Hong, Sung-Jin;Yoo, Do-Guen
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1235-1249
    • /
    • 2022
  • This study applied the web crawling technique for extracting big data news on water quality accidents in the water supply system and presented the algorithm in a procedural way to obtain accurate water quality accident news. In addition, in the case of a large-scale water quality accident, development patterns such as accident recognition, accident spread, accident response, and accident resolution appear according to the occurrence of an accident. That is, the analysis of the development of water quality accidents through key keywords and sentiment analysis for each stage was carried out in detail based on case studies, and the meanings were analyzed and derived. The proposed methodology was applied to the larval accident period of Incheon Metropolitan City in 2020 and analyzed. As a result, in a situation where the disclosure of information that directly affects consumers, such as water quality accidents, is restricted, the tone of news articles and media reports about water quality accidents with long-term damage in the event of an accident and the degree of consumer pride clearly change over time. could check This suggests the need to prepare consumer-centered policies to increase consumer positivity, although rapid restoration of facilities is very important for the development of water quality accidents from the supplier's point of view.

An Analysis of the 2017 Korean Presidential Election Using Text Mining (텍스트 마이닝을 활용한 2017년 한국 대선 분석)

  • An, Eunhee;An, Jungkook
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.5
    • /
    • pp.199-207
    • /
    • 2020
  • Recently, big data analysis has drawn attention in various fields as it can generate value from large amounts of data and is also used to run political campaigns or predict results. However, existing research had limitations in compiling information about candidates at a high-level by analyzing only specific SNS data. Therefore, this study analyses news trends, topics extraction, sentiment analysis, keyword analysis, comment analysis for the 2017 presidential election of South Korea. The results show that various topics had been generated, and online opinions are extracted for trending keywords of respective candidates. This study also shows that portal news and comments can serve as useful tools for predicting the public's opinion on social issues. This study will This paper advances a building strategic course of action by providing a method of analyzing public opinion across various fields.

Optimizing Language Models through Dataset-Specific Post-Training: A Focus on Financial Sentiment Analysis (데이터 세트별 Post-Training을 통한 언어 모델 최적화 연구: 금융 감성 분석을 중심으로)

  • Hui Do Jung;Jae Heon Kim;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.57-67
    • /
    • 2024
  • This research investigates training methods for large language models to accurately identify sentiments and comprehend information about increasing and decreasing fluctuations in the financial domain. The main goal is to identify suitable datasets that enable these models to effectively understand expressions related to financial increases and decreases. For this purpose, we selected sentences from Wall Street Journal that included relevant financial terms and sentences generated by GPT-3.5-turbo-1106 for post-training. We assessed the impact of these datasets on language model performance using Financial PhraseBank, a benchmark dataset for financial sentiment analysis. Our findings demonstrate that post-training FinBERT, a model specialized in finance, outperformed the similarly post-trained BERT, a general domain model. Moreover, post-training with actual financial news proved to be more effective than using generated sentences, though in scenarios requiring higher generalization, models trained on generated sentences performed better. This suggests that aligning the model's domain with the domain of the area intended for improvement and choosing the right dataset are crucial for enhancing a language model's understanding and sentiment prediction accuracy. These results offer a methodology for optimizing language model performance in financial sentiment analysis tasks and suggest future research directions for more nuanced language understanding and sentiment analysis in finance. This research provides valuable insights not only for the financial sector but also for language model training across various domains.

Analysis of remote learning trends in the COVID-19 period using news big data (뉴스 빅데이터를 활용한 코로나 19시기의 원격 교육 동향 분석)

  • Lee, Youngho;Koo, Dukhoi
    • 한국정보교육학회:학술대회논문집
    • /
    • 2021.08a
    • /
    • pp.193-197
    • /
    • 2021
  • The pandemic situation caused by COVID-19 has a large and small impact on our society socially, economically, psychologically, and other aspects. In order to prevent the spread of COVID-19, various countries, including Korea, have entered into long-term home care and distance learning systems. However, distance learning experiments conducted in many countries have raised whether face-to-face education can be replaced by distance learning. Therefore, in this study, public opinion, social perception, and field trends were analyzed based on media reports on distance learning. For this purpose, 2,600 articles from 11 newspapers and four broadcasters related to distance learning were collected in this study. Based on this data, keyword trend analysis, topic modeling analysis, sentiment analysis were performed.

  • PDF

Factor augmentation for cryptocurrency return forecasting (암호화폐 수익률 예측력 향상을 위한 요인 강화)

  • Yeom, Yebin;Han, Yoojin;Lee, Jaehyun;Park, Seryeong;Lee, Jungwoo;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.189-201
    • /
    • 2022
  • In this study, we propose factor augmentation to improve forecasting power of cryptocurrency return. We consider financial and economic variables as well as psychological aspect for possible factors. To be more specific, financial and economic factors are obtained by applying principal factor analysis. Psychological factor is summarized by news sentiment analysis. We also visualize such factors through impulse response analysis. In the modeling perspective, we consider ARIMAX as the classical model, and random forest and deep learning to accommodate nonlinear features. As a result, we show that factor augmentation reduces prediction error and the GRU performed the best amongst all models considered.

A study on Korean tourism trends using social big data -Focusing on sentiment analysis- (소셜 빅데이터를 활용한 한국관광 트렌드에 관한연구 -감성분석을 중심으로-)

  • Youn-hee Choi;Kyoung-mi Yoo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.97-109
    • /
    • 2024
  • In the field of domestic tourism, tourism trend analysis of tourism consumers, both international tourists and domestic tourists, is essential not only for the Korean tourism market but also for local and governmental tourism policy makers. e will explore the keywords and sentiment analysis on social media to establish a marketing strategy plan and revitalize the domestic tourism industry through communication and information from tourism consumers. This study utilized TEXTOM 6.0 to analyze recent trends in Korean tourism. Data was collected from September 31, 2022, to August 31, 2023, using 'Korean tourism' and 'domestic tourism' as keywords, targeting blogs, cafes, and news provided by Naver, Daum, and Google. Through text mining, 100 key words and TF-IDF were extracted in order of frequency, and then CONCOR analysis and sentiment analysis were conducted. For Korean tourism keywords, words related to tourist destinations, travel companions and behaviors, tourism motivations and experiences, accommodation types, tourist information, and emotional connections ranked high. The results of the CONCOR analysis were categorized into five clusters related to tourist destinations, tourist information, tourist activities/experiences, tourism motivation/content, and inbound related. Finally, the sentiment analysis showed a high level of positive documents and vocabulary. This study analyzes the rapidly changing trends of Korean tourism through text mining on Korean tourism and is expected to provide meaningful data to promote domestic tourism not only for Koreans but also for foreigners visiting Korea.

Analyzing Topic Trends and the Relationship between Changes in Public Opinion and Stock Price based on Sentiment of Discourse in Different Industry Fields using Comments of Naver News (네이버 뉴스 댓글을 이용한 산업 분야별 담론의 감성에 기반한 주제 트렌드 및 여론의 변화와 주가 흐름의 연관성 분석)

  • Oh, Chanhee;Kim, Kyuli;Zhu, Yongjun
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.257-280
    • /
    • 2022
  • In this study, we analyzed comments on news articles of representative companies of the three industries (i.e., semiconductor, secondary battery, and bio industries) that had been listed as national strategic technology projects of South Korea to identify public opinions towards them. In addition, we analyzed the relationship between changes in public opinion and stock price. 'Samsung Electronics' and 'SK Hynix' in the semiconductor industry, 'Samsung SDI' and 'LG Chem' in the secondary battery industry, and 'Samsung Biologics' and 'Celltrion' in the bio-industry were selected as the representative companies and 47,452 comments of news articles about the companies that had been published from January 1, 2020, to December 31, 2020, were collected from Naver News. The comments were grouped into positive, neutral, and negative emotions, and the dynamic topics of comments over time in each group were analyzed to identify the trends of public opinion in each industry. As a result, in the case of the semiconductor industry, investment, COVID-19 related issues, trust in large companies such as Samsung Electronics, and mention of the damage caused by changes in government policy were the topics. In the case of secondary battery industries, references to investment, battery, and corporate issues were the topics. In the case of bio-industries, references to investment, COVID-19 related issues, and corporate issues were the topics. Next, to understand whether the sentiment of the comments is related to the actual stock price, for each company, the changes in the stock price and the sentiment values of the comments were compared and analyzed using visual analytics. As a result, we found a clear relationship between the changes in the sentiment value of public opinion and the stock price through the similar patterns shown in the change graphs. This study analyzed comments on news articles that are highly related to stock price, identified changes in public opinion trends in the COVID-19 era, and provided objective feedback to government agencies' policymaking.

Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news (온라인 언급이 기업 성과에 미치는 영향 분석 : 뉴스 감성분석을 통한 기업별 주가 예측)

  • Jeong, Ji Seon;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.37-51
    • /
    • 2015
  • Due to the development of internet technology and the rapid increase of internet data, various studies are actively conducted on how to use and analyze internet data for various purposes. In particular, in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of the current application of structured data. Especially, there are various studies on sentimental analysis to score opinions based on the distribution of polarity such as positivity or negativity of vocabularies or sentences of the texts in documents. As a part of such studies, this study tries to predict ups and downs of stock prices of companies by performing sentimental analysis on news contexts of the particular companies in the Internet. A variety of news on companies is produced online by different economic agents, and it is diffused quickly and accessed easily in the Internet. So, based on inefficient market hypothesis, we can expect that news information of an individual company can be used to predict the fluctuations of stock prices of the company if we apply proper data analysis techniques. However, as the areas of corporate management activity are different, an analysis considering characteristics of each company is required in the analysis of text data based on machine-learning. In addition, since the news including positive or negative information on certain companies have various impacts on other companies or industry fields, an analysis for the prediction of the stock price of each company is necessary. Therefore, this study attempted to predict changes in the stock prices of the individual companies that applied a sentimental analysis of the online news data. Accordingly, this study chose top company in KOSPI 200 as the subjects of the analysis, and collected and analyzed online news data by each company produced for two years on a representative domestic search portal service, Naver. In addition, considering the differences in the meanings of vocabularies for each of the certain economic subjects, it aims to improve performance by building up a lexicon for each individual company and applying that to an analysis. As a result of the analysis, the accuracy of the prediction by each company are different, and the prediction accurate rate turned out to be 56% on average. Comparing the accuracy of the prediction of stock prices on industry sectors, 'energy/chemical', 'consumer goods for living' and 'consumer discretionary' showed a relatively higher accuracy of the prediction of stock prices than other industries, while it was found that the sectors such as 'information technology' and 'shipbuilding/transportation' industry had lower accuracy of prediction. The number of the representative companies in each industry collected was five each, so it is somewhat difficult to generalize, but it could be confirmed that there was a difference in the accuracy of the prediction of stock prices depending on industry sectors. In addition, at the individual company level, the companies such as 'Kangwon Land', 'KT & G' and 'SK Innovation' showed a relatively higher prediction accuracy as compared to other companies, while it showed that the companies such as 'Young Poong', 'LG', 'Samsung Life Insurance', and 'Doosan' had a low prediction accuracy of less than 50%. In this paper, we performed an analysis of the share price performance relative to the prediction of individual companies through the vocabulary of pre-built company to take advantage of the online news information. In this paper, we aim to improve performance of the stock prices prediction, applying online news information, through the stock price prediction of individual companies. Based on this, in the future, it will be possible to find ways to increase the stock price prediction accuracy by complementing the problem of unnecessary words that are added to the sentiment dictionary.

Learning Algorithms in AI System and Services

  • Jeong, Young-Sik;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1029-1035
    • /
    • 2019
  • In recent years, artificial intelligence (AI) services have become one of the most essential parts to extend human capabilities in various fields such as face recognition for security, weather prediction, and so on. Various learning algorithms for existing AI services are utilized, such as classification, regression, and deep learning, to increase accuracy and efficiency for humans. Nonetheless, these services face many challenges such as fake news spread on social media, stock selection, and volatility delay in stock prediction systems and inaccurate movie-based recommendation systems. In this paper, various algorithms are presented to mitigate these issues in different systems and services. Convolutional neural network algorithms are used for detecting fake news in Korean language with a Word-Embedded model. It is based on k-clique and data mining and increased accuracy in personalized recommendation-based services stock selection and volatility delay in stock prediction. Other algorithms like multi-level fusion processing address problems of lack of real-time database.

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.