• Title/Summary/Keyword: news data

Search Result 885, Processing Time 0.031 seconds

Exploring Opinions on COVID-19 Vaccines through Analyzing Twitter Posts (트위터 게시물 분석을 통한 코로나바이러스감염증-19 백신에 대한 의견 탐색)

  • Jung, Woojin;Kim, Kyuli;Yoo, Seunghee;Zhu, Yongjun
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.4
    • /
    • pp.113-128
    • /
    • 2021
  • In this study, we aimed to understand the public opinion on COVID-19 vaccine. To achieve the goal, we analyzed COVID-19 vaccine-related Twitter posts. 45,413 tweets posted from March 16, 2020 to March 15, 2021 including COVID-19 vaccine names as keywords were collected. The 12 vaccine names used for data collection included 'Pfizer', 'AstraZeneca', 'Modena', 'Jansen', 'NovaVax', 'Sinopharm', 'SinoVac', 'Sputnik V', 'Bharat', 'KhanSino', 'Chumakov', and 'VECTOR' in the order of the number of collected posts. The collected posts were analyzed manually and automatedly through keyword analysis, sentiment analysis, and topic modeling to understand the opinions for the investigated vaccines. According to the results, there were generally more negative posts about vaccines than positive posts. Anxiety about the aftereffects of vaccination and distrust in the efficacy of vaccines were identified as major negative factors for vaccines. On the contrary, the anticipation for the suppression of the spread of coronavirus following vaccination was identified as a positive social factor for vaccines. Different from previous studies that investigated opinions about COVID-19 vaccines through mass media data such as news articles, this study explores opinions of social media users using keyword analysis, sentiment analysis, and topic modeling. In addition, the results of this study can be used by governmental institutions for making policies to promote vaccination reflecting the social atmosphere.

Effects of Exposure to Cooking Show Contents on the Consumption of Agricultural Products: Focused on Potato Consumption (쿡방 콘텐츠 노출이 농식품 소비에 미치는 효과: 감자 소비를 중심으로)

  • Rah, HyungChul;Kim, Hyeon-Woong;Ko, Hyeonseok;Shin, Jaehoon;Cho, Yongbeen;Nasridinov, Aziz;Yoo, Kwan-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.400-407
    • /
    • 2021
  • Recently, mukbang and cookbang or cooking shows on TV and YouTube channels have increased, and the influences of these broadcasts on food consumption have been gradually increasing. There were several news articles on 'Baek Jong-won effect', in which the consumption of the agri-food Mr. Jong-won Baek mentioned on his broadcast soared, and even foods named after him are on the market. In this study, Mr. Jong-won Baek, who produces influential cooking contents through various media, was taken as a representative example. We evaluated if 'Baek Jong-won effect' exists on potato consumption, which Mr. Jong-won Baek broadcasted potato cooking recipes on TV and YouTube. After the potato recipe was broadcasted for the first time on the TV show called HomeFoodRescue, the differences in the amount of money to purchase potatoes before and after the broadcast were estimated by using the money amount to purchase data of Agri-food consumers panel and the difference-in-differences method at 6 time points (3, 6, 9, 12, 24, and 36 months). Among the time points analyzed, the potato purchases at post-broadcast were less than those at pre-broadcast. No results were observed suggesting the existence of 'Baek Jong-won effect' on potato consumption through HomeFoodRescue show in the study.

A Study on IP Camera Security Issues and Mitigation Strategies (IP 카메라 보안의 문제점 분석 및 보완 방안 연구)

  • Seungjin Shin;Jungheum Park;Sangjin Lee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.3
    • /
    • pp.111-118
    • /
    • 2023
  • Cyber attacks are increasing worldwide, and attacks on personal privacy such as CCTV and IP camera hacking are also increasing. If you search for IP camera hacking methods in spaces such as YouTube, SNS, and the dark web, you can easily get data and hacking programs are also on sale. If you use an IP camera that has vulnerabilities used by hacking programs, you easily get hacked even if you change your password regularly or use a complex password including special characters, uppercase and lowercase letters, and numbers. Although news and media have raised concerns about the security of IP cameras and suggested measures to prevent damage, hacking incidents continue to occur. In order to prevent such hacking damage, it is necessary to identify the cause of the hacking incident and take concrete measures. First, we analyzed weak account settings and web server vulnerabilities of IP cameras, which are the causes of IP camera hacking, and suggested solutions. In addition, as a specific countermeasure against hacking, it is proposed to add a function to receive a notification when an IP camera is connected and a function to save the connection history. If there is such a function, the fact of damage can be recognized immediately, and important data can be left in arresting criminals. Therefore, in this paper, we propose a method to increase the safety from hacking by using the connection notification function and logging function of the IP camera.

Detecting Weak Signals for Carbon Neutrality Technology using Text Mining of Web News (탄소중립 기술의 미래신호 탐색연구: 국내 뉴스 기사 텍스트데이터를 중심으로)

  • Jisong Jeong;Seungkook Roh
    • Journal of Industrial Convergence
    • /
    • v.21 no.5
    • /
    • pp.1-13
    • /
    • 2023
  • Carbon neutrality is the concept of reducing greenhouse gases emitted by human activities and making actual emissions zero through removal of remaining gases. It is also called "Net-Zero" and "carbon zero". Korea has declared a "2050 Carbon Neutrality policy" to cope with the climate change crisis. Various carbon reduction legislative processes are underway. Since carbon neutrality requires changes in industrial technology, it is important to prepare a system for carbon zero. This paper aims to understand the status and trends of global carbon neutrality technology. Therefore, ROK's web platform "www.naver.com." was selected as the data collection scope. Korean online articles related to carbon neutrality were collected. Carbon neutrality technology trends were analyzed by future signal methodology and Word2Vec algorithm which is a neural network deep learning technology. As a result, technology advancement in the steel and petrochemical sectors, which are carbon over-release industries, was required. Investment feasibility in the electric vehicle sector and technology advancement were on the rise. It seems that the government's support for carbon neutrality and the creation of global technology infrastructure should be supported. In addition, it is urgent to cultivate human resources, and possible to confirm the need to prepare support policies for carbon neutrality.

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.

Introducing SEABOT: Methodological Quests in Southeast Asian Studies

  • Keck, Stephen
    • SUVANNABHUMI
    • /
    • v.10 no.2
    • /
    • pp.181-213
    • /
    • 2018
  • How to study Southeast Asia (SEA)? The need to explore and identify methodologies for studying SEA are inherent in its multifaceted subject matter. At a minimum, the region's rich cultural diversity inhibits both the articulation of decisive defining characteristics and the training of scholars who can write with confidence beyond their specialisms. Consequently, the challenges of understanding the region remain and a consensus regarding the most effective approaches to studying its history, identity and future seem quite unlikely. Furthermore, "Area Studies" more generally, has proved to be a less attractive frame of reference for burgeoning scholarly trends. This paper will propose a new tool to help address these challenges. Even though the science of artificial intelligence (AI) is in its infancy, it has already yielded new approaches to many commercial, scientific and humanistic questions. At this point, AI has been used to produce news, generate better smart phones, deliver more entertainment choices, analyze earthquakes and write fiction. The time has come to explore the possibility that AI can be put at the service of the study of SEA. The paper intends to lay out what would be required to develop SEABOT. This instrument might exist as a robot on the web which might be called upon to make the study of SEA both broader and more comprehensive. The discussion will explore the financial resources, ownership and timeline needed to make SEABOT go from an idea to a reality. SEABOT would draw upon artificial neural networks (ANNs) to mine the region's "Big Data", while synthesizing the information to form new and useful perspectives on SEA. Overcoming significant language issues, applying multidisciplinary methods and drawing upon new yields of information should produce new questions and ways to conceptualize SEA. SEABOT could lead to findings which might not otherwise be achieved. SEABOT's work might well produce outcomes which could open up solutions to immediate regional problems, provide ASEAN planners with new resources and make it possible to eventually define and capitalize on SEA's "soft power". That is, new findings should provide the basis for ASEAN diplomats and policy-makers to develop new modalities of cultural diplomacy and improved governance. Last, SEABOT might also open up avenues to tell the SEA story in new distinctive ways. SEABOT is seen as a heuristic device to explore the results which this instrument might yield. More important the discussion will also raise the possibility that an AI-driven perspective on SEA may prove to be even more problematic than it is beneficial.

  • PDF

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Analysis of Rice Blast Outbreaks in Korea through Text Mining (텍스트 마이닝을 통한 우리나라의 벼 도열병 발생 개황 분석)

  • Song, Sungmin;Chung, Hyunjung;Kim, Kwang-Hyung;Kim, Ki-Tae
    • Research in Plant Disease
    • /
    • v.28 no.3
    • /
    • pp.113-121
    • /
    • 2022
  • Rice blast is a major plant disease that occurs worldwide and significantly reduces rice yields. Rice blast disease occurs periodically in Korea, causing significant socio-economic damage due to the unique status of rice as a major staple crop. A disease outbreak prediction system is required for preventing rice blast disease. Epidemiological investigations of disease outbreaks can aid in decision-making for plant disease management. Currently, plant disease prediction and epidemiological investigations are mainly based on quantitatively measurable, structured data such as crop growth and damage, weather, and other environmental factors. On the other hand, text data related to the occurrence of plant diseases are accumulated along with the structured data. However, epidemiological investigations using these unstructured data have not been conducted. The useful information extracted using unstructured data can be used for more effective plant disease management. This study analyzed news articles related to the rice blast disease through text mining to investigate the years and provinces where rice blast disease occurred most in Korea. Moreover, the average temperature, total precipitation, sunshine hours, and supplied rice varieties in the regions were also analyzed. Through these data, it was estimated that the primary causes of the nationwide outbreak in 2020 and the major outbreak in Jeonbuk region in 2021 were meteorological factors. These results obtained through text mining can be combined with deep learning technology to be used as a tool to investigate the epidemiology of rice blast disease in the future.

Popularization of Marathon through Social Network Big Data Analysis : Focusing on JTBC Marathon (소셜 네트워크 빅데이터 분석을 통한 마라톤 대중화 : JTBC 마라톤대회를 중심으로)

  • Lee, Ji-Su;Kim, Chi-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.3
    • /
    • pp.27-40
    • /
    • 2020
  • The marathon has long been established as a representative lifestyle for all ages. With the recent expansion of the Work and Life Balance trend across the society, marathon with a relatively low barrier to entry is gaining popularity among young people in their 20s and 30s. By analyzing the issues and related words of the marathon event, we will analyze the spottainment elements of the marathon event that is popular among young people through keywords, and suggest a development plan for the differentiated event. In order to analyze keywords and related words, blogs, cafes and news provided by Naver and Daum were selected as analysis channels, and 'JTBC Marathon' and 'Culture' were extracted as key words for data search. The data analysis period was limited to a three-month period from August 13, 2019 to November 13, 2019, when the application for participation in the 2019 JTBC Marathon was started. For data collection and analysis, frequency and matrix data were extracted through social matrix program Textom. In addition, the degree of the relationship was quantified by analyzing the connection structure and the centrality of the degree of connection between the words. Although the marathon is a personal movement, young people share a common denominator of "running" and form a new cultural group called "running crew" with other young people. Through this, it was found that a marathon competition culture was formed as a festival venue where people could train together, participate together, and escape from the image of a marathon run alone and fight with themselves.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.