• Title/Summary/Keyword: news data

Search Result 888, Processing Time 0.031 seconds

A Method on Associated Document Recommendation with Word Correlation Weights (단어 연관성 가중치를 적용한 연관 문서 추천 방법)

  • Kim, Seonmi;Na, InSeop;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.250-259
    • /
    • 2019
  • Big data processing technology and artificial intelligence (AI) are increasingly attracting attention. Natural language processing is an important research area of artificial intelligence. In this paper, we use Korean news articles to extract topic distributions in documents and word distribution vectors in topics through LDA-based Topic Modeling. Then, we use Word2vec to vector words, and generate a weight matrix to derive the relevance SCORE considering the semantic relationship between the words. We propose a way to recommend documents in order of high score.

The Next Generation of Energy News Big Data Analytics (차세대 에너지 관련 뉴스 빅데이터 분석)

  • Lee, YeChan;Cho, HaeChan;Ban, ChaeHoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.451-453
    • /
    • 2016
  • 대규모의 데이터가 생산되고 저장되는 정보화 시대에서 현재와 과거의 데이터를 바탕으로 미래를 추측하고 방향성을 알아갈 수 있는 빅데이터의 중요성이 강조되고 있다. 정형되지 못한 대규모 데이터를 빅데이터 분석 도구인 R을 통해 통계를 기초로 데이터의 정보분석과 정형화하도록 한다. 본 논문에서는 R을 이용하여 뉴스에서 나타나는 차세대 에너지 관련 빅데이터를 분석한다. 뉴스 기사에서 차세대 에너지 관련 데이터를 수집하고 수집된 키워드를 이용하여 근미래의 효율적인 차세대 에너지의 등장을 예측한다. 에너지 산업의 추진에 대한 흐름과 방향성을 제시하고 의사결정을 위한 기술적 과제를 도출함으로 탄력적인 경영과 의사결정에 도움을 주며 기술적 문제의 근원을 사전에 예측하고 방지할 수 있을 것으로 보여진다.

  • PDF

An Approach for Stock Price Forecast using Long Short Term Memory

  • K.A.Surya Rajeswar;Pon Ramalingam;Sudalaimuthu.T
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.166-171
    • /
    • 2023
  • The Stock price analysis is an increasing concern in a financial time series. The purpose of the study is to analyze the price parameters of date, high, low, and news feed about the stock exchange price. Long short term memory (LSTM) is a cutting-edge technology used for predicting the data based on time series. LSTM performs well in executing large sequence of data. This paper presents the Long Short Term Memory Model has used to analyze the stock price ranges of 10 days and 20 days by exponential moving average. The proposed approach gives better performance using technical indicators of stock price with an accuracy of 82.6% and cross entropy of 71%.

Covid 19 news data analysis (코로나 19 뉴스데이터 분석 및 시각화)

  • Hur, Tai-seong;Hwang, In Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.241-242
    • /
    • 2021
  • 본 논문에서는 2020년 1월부터 2020년 8월까지 8개월간의 유통되었던 코로나 19와 관련된 뉴스 데이터를 이용하여 기간 및 지역별 단어의 빈도수를 구하고, 그 결과를 활용해 코로나 19와의 상관관계를 분석하고, 시각화하였다. 뉴스데이터는 한국언론진흥재단에서 운영하는 뉴스 빅데이터 시스템인 '빅카인즈'에서 수집된 데이터를 이용하였다. 본 논문에서 웹서비스를 활용해 시각화하였으며 지역과 기간을 선택하면 분석한 결과를 불러와 전체 지역대비 선택한 지역의 뉴스 빈도수, 선택한 지역의 주요 키워드, 주요 키워드의 지역별 일자별 변화 등을 보여주고 있다. 이러한 시각화를 통해 이전에 발생되었던 사건에 대해 주요 키워드와 코로나 19의 상관관계를 쉽게 파악을 할 수 있다.

  • PDF

The Effect of Covid-19 on Suicide through Statistical Analysis and Topic Modeling of News Articles (통계 분석과 뉴스 기사 토픽 모델링을 통한 코로나19가 자살에 미치는 영향 분석)

  • Kwon, Minji;Kim, Junchul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.518-520
    • /
    • 2021
  • 전 세계적으로 확산된 코로나19의 장기화로 인해 국민들은 경제적, 심리적 어려움을 겪고 있으며, 이에 따른 자살 시도에 대한 우려가 높아지고 있다. 본 연구에서는 자살사망자 통계와 자살 관련 뉴스 기사의 토픽 모델링을 통해 코로나19가 자살에 미친 영향을 분석하였다. 그 결과 수치적으로는 재난 직후 자살률이 일시적으로 감소하는 '허니문 기간'을 보였고, 의미적으로는 자살 예방에 대한 중요성이 지속적으로 부각되었다. 또한 유명인 또는 사회적으로 이슈화된 사건에 대한 수사 및 사실관계가 언론을 통해 드러났으며, 연초를 지나도 꾸준히 유지되는 경제 관련 이슈가 도출되었다.

Study of major issues and trends facing ports, using big data news: From 1991 to 2020 (뉴스 빅데이터를 활용한 항만이슈 변화연구 : 1991~2020)

  • Yoon, Hee-Young
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.159-178
    • /
    • 2021
  • This study analyzed issues and trends related to ports with 86,611 news articles for the 30 years from 1991 to 2020, using BIGKinds, a big data news analysis service. The analysis was based on keyword analysis, word cloud, relationship diagram analysis offered by BIG Kinds. Analysis results of issues and trends on ports for the last 30 years are summarized as follows. First, during Phase 1 (1991-2000), individual ports such as Busan, Incheon, and Gwangyang ports tried to strengthen their own competitiveness. During Phase 2 (2001-2010), efforts were made on gaining more professional and specialized port management abilities by establishing the Busan Port Authority in 2004, the Incheon Port Authority in 2005, and the Ulsan Port Authority in 2007. During Phase 3 (2011-2020), the promotion of future-oriented, eco-friendly, and smart ports was major issues. Efforts to reduce particulate matters and pollutants produced from ports were accelerated, and an attempt to build a smart port driven by port automation and digitalization was also intensified. Lastly, in 2020, when the maritime sector was severely hit by the unexpected shock of the COVID-19 pandemic, a microscopic analysis of trends and issues in 2019 and 2020 was made to look into the impact the pandemic on the maritime industry. It was found that shipping and port industries experienced more drastic changes than ever while trying to prepare for a post-pandemic era as well as promoting future-oriented ports. This study made policy suggestions by analyzing port-related news articles and trends, and it is expected that based on the findings of this research, further studies on enhancing the competitiveness of ports and devising a sustainable development strategy will follow through a comparative analysis of port issues of different countries, thereby making further progress toward academic research on ports.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Data Processing and Visualization Method for Retrospective Data Analysis and Research Using Patient Vital Signs (환자의 활력 징후를 이용한 후향적 데이터의 분석과 연구를 위한 데이터 가공 및 시각화 방법)

  • Kim, Su Min;Yoon, Ji Young
    • Journal of Biomedical Engineering Research
    • /
    • v.42 no.4
    • /
    • pp.175-185
    • /
    • 2021
  • Purpose: Vital sign are used to help assess the general physical health of a person, give clues to possible diseases, and show progress toward recovery. Researchers are using vital sign data and AI(artificial intelligence) to manage a variety of diseases and predict mortality. In order to analyze vital sign data using AI, it is important to select and extract vital sign data suitable for research purposes. Methods: We developed a method to visualize vital sign and early warning scores by processing retrospective vital sign data collected from EMR(electronic medical records) and patient monitoring devices. The vital sign data used for development were obtained using the open EMR big data MIMIC-III and the wearable patient monitoring device(CareTaker). Data processing and visualization were developed using Python. We used the development results with machine learning to process the prediction of mortality in ICU patients. Results: We calculated NEWS(National Early Warning Score) to understand the patient's condition. Vital sign data with different measurement times and frequencies were sampled at equal time intervals, and missing data were interpolated to reconstruct data. The normal and abnormal states of vital sign were visualized as color-coded graphs. Mortality prediction result with processed data and machine learning was AUC of 0.892. Conclusion: This visualization method will help researchers to easily understand a patient's vital sign status over time and extract the necessary data.

A Study on the Prediction Index for Chart Success of Digital Music Contents based on Analysis of Social Data (소셜 데이터 분석을 통한 음원 흥행 예측 지표 연구)

  • Kim, Ga-Yeon;Kim, Myoung-Jun
    • Journal of Digital Contents Society
    • /
    • v.19 no.6
    • /
    • pp.1105-1114
    • /
    • 2018
  • The growth rate of the domestic digital music contents market has been remarkable recently. Accordingly, the necessity of prediction for chart success of digital music contents has grown. This paper proposes prediction indexes for chart success of digital music contents through analysis of correlation between social data such as Internet news, SNS and entry rankings in Melon's weekly music charts. We collected a total of 10 social data items for each male and female artist, and executed cluster analysis. Through this, we found meaningful prediction indexes for chart success of digital music contents for each male and female artist.

A Development of Web Proxy for the Satellite Communication (위성통신을 위한 웹 프록시 개발)

  • Jeon, Sung-Yoon;Kim, Geun-Hyung
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.12
    • /
    • pp.1403-1412
    • /
    • 2013
  • In the maritime ships or airplanes, users should utilize the satellite channel in orer to use web service. However, the satellite channel is costly and does not give users satisfied response time. In the ship, the users may receive plenty of extra data when they obtain internet news. The extra data may be unnecessary image and advertise. Therefore, they should pay unnecessary data usage charges as well. In this paper, we suggest a proxy model that solves the problem of cost and speed. The proposed proxy reduces the of data through the satellite link by image and advertising blocking, caching, image re-requesting functions. It's performance was tested by a real satellite communication.