• Title/Summary/Keyword: news data

Search Result 894, Processing Time 0.024 seconds

Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method (이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안)

  • Jeong, Do-Heon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.83-105
    • /
    • 2019
  • This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

Time-Series based Dataset Selection Method for Effective Text Classification (효율적인 문헌 분류를 위한 시계열 기반 데이터 집합 선정 기법)

  • Chae, Yeonghun;Jeong, Do-Heon
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.1
    • /
    • pp.39-49
    • /
    • 2017
  • As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and $Na{\ddot{i}}ve$ Bayes. In each model, we show that classification performance is increasing. Through this study, we showed that reflecting time-series information can improve the classification performance.

A Study on the Change of Smart City's Issues and Perception : Focus on News, Blog, and Twitter (스마트도시의 이슈와 인식변화에 관한 연구 : 뉴스, 블로그, 트위터 자료를 중심으로)

  • Jang, Hwan-Young
    • Journal of Cadastre & Land InformatiX
    • /
    • v.49 no.2
    • /
    • pp.67-82
    • /
    • 2019
  • The purpose of this study is to analyze the issues and perceptions of smart cities. First, based on the big data analysis platform, big data analysis on smart cities were conducted to derive keywords by year, word cloud, and frequency of generation of smart city keywords by time. Second, trend and flow by area were analyzed by reclassifying major keywords by year based on meta-keywords. Third, emotional recognition flow for smart cities and major emotional keywords were derived. While U-City in the past is mostly centered on creating infrastructure for new towns, recent smart cities are focusing on sustainable urban construction led by citizens, according to the analysis. In addition, it was analyzed that while infrastructure, service, and technology were emphasized in the past, management and methodology were emphasized recently, and positive perception of smart cities was growing. The study could be used as basic data for the past, present and future of smart cities in Korea at a time when smart city services are being built across the country.

A Study on Monitoring Method of Citizen Opinion based on Big Data : Focused on Gyeonggi Lacal Currency (Gyeonggi Money) (빅데이터 기반 시민의견 모니터링 방안 연구 : "경기지역화폐"를 중심으로)

  • Ahn, Soon-Jae;Lee, Sae-Mi;Ryu, Seung-Ei
    • Journal of Digital Convergence
    • /
    • v.18 no.7
    • /
    • pp.93-99
    • /
    • 2020
  • Text mining is one of the big data analysis methods that extracts meaningful information from atypical large-scale text data. In this study, text mining was used to monitor citizens' opinions on the policies and systems being implemented. We collected 5,108 newspaper articles and 748 online cafe posts related to 'Gyeonggi Lacal Currency' and performed frequency analysis, TF-IDF analysis, association analysis, and word tree visualization analysis. As a result, many articles related to the purpose of introducing local currency, the benefits provided, and the method of use. However, the contents related to the actual use of local currency were written in the online cafe posts. In order to revitalize local currency, the news was involved in the promotion of local currency as an informant. Online cafe posts consisted of the opinions of citizens who are local currency users. SNS and text mining are expected to effectively activate various policies as well as local currency.

Coreference Resolution for Korean Using Random Forests (랜덤 포레스트를 이용한 한국어 상호참조 해결)

  • Jeong, Seok-Won;Choi, MaengSik;Kim, HarkSoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.535-540
    • /
    • 2016
  • Coreference resolution is to identify mentions in documents and is to group co-referred mentions in the documents. It is an essential step for natural language processing applications such as information extraction, event tracking, and question-answering. Recently, various coreference resolution models based on ML (machine learning) have been proposed, As well-known, these ML-based models need large training data that are manually annotated with coreferred mention tags. Unfortunately, we cannot find usable open data for learning ML-based models in Korean. Therefore, we propose an efficient coreference resolution model that needs less training data than other ML-based models. The proposed model identifies co-referred mentions using random forests based on sieve-guided features. In the experiments with baseball news articles, the proposed model showed a better CoNLL F1-score of 0.6678 than other ML-based models.

Machine Learning based Firm Value Prediction Model: using Online Firm Reviews (머신러닝 기반의 기업가치 예측 모형: 온라인 기업리뷰를 활용하여)

  • Lee, Hanjun;Shin, Dongwon;Kim, Hee-Eun
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.79-86
    • /
    • 2021
  • As the usefulness of big data analysis has been drawing attention, many studies in the business research area begin to use big data to predict firm performance. Previous studies mainly rely on data outside of the firm through news articles and social media platforms. The voices within the firm in the form of employee satisfaction or evaluation of the strength and weakness of the firm can potentially affect firm value. However, there is insufficient evidence that online employee reviews are valid to predict firm value because the data is relatively difficult to obtain. To fill this gap, from 2014 to 2019, we employed 97,216 reviews collected by JobPlanet, an online firm review website in Korea, and developed a machine learning-based predictive model. Among the proposed models, the LSTM-based model showed the highest accuracy at 73.2%, and the MAE showed the lowest error at 0.359. We expect that this study can be a useful case in the field of firm value prediction on domestic companies.

A Study on deduction of important factors for new infectious diseases through big data analysis (빅데이터 분석을 통한 신종감염병 중요 요인 도출)

  • Suh, Kyung-Do
    • Journal of Industrial Convergence
    • /
    • v.19 no.3
    • /
    • pp.35-40
    • /
    • 2021
  • This study attempted to derive important factors of emerging infectious diseases by collecting and analyzing text data onto emerging infectious diseases. For this purpose, articles in the Naver News database were directly crawled, pre-processed, and used for data analysis. In addition, additional analysis was performed using Big Kinds. As a result of the priority analysis, the importance was shown in the order of corona, infectious disease, quarantine, vaccine, outbreak, virus, infection, and development. As a result of the proximity centrality analysis, the importance was shown in the order of government, death, and plan, and the analysis result of Big Kinds showed that Covid-19 and the Korea Centers for Disease Control and Prevention were important. Based on the results of this study, it can be said that the government's policy support is needed to raise public awareness of new infectious diseases, prevent disease, and develop vaccines and treatments.

Big Data Analysis on Daegu-Gyeongbuk Administrative Integration (대구·경북 행정통합에 대한 빅데이터 분석)

  • Song, Hwa Young;Park, Han Woo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.5
    • /
    • pp.139-148
    • /
    • 2021
  • The study examines public attitude and reaction regarding administrative integration in Daegu and Gyeongbuk area. Specifically, it employs social big data including textual comments on online news articles and YouTube video clips. The collected data are analyzed in order to compare two periods, that is, before and after the inauguration of the Public Opinion Committee for One Daegu-Gyeongbuk. As a result, we have found that people's favorable response to administrative integration has gradually increased since the launch of the Committee. However, it still lacks specific administrative procedures and discussion topics among the frequently used words in the collected data. Thus, the Committee needs to provide a variety of information and materials related to administrative integration.

A Data Analysis and Visualization of AI Ethics -Focusing on the interactive AI service 'Lee Luda'- (인공지능 윤리 인식에 대한 데이터 분석 및 시각화 연구 -대화형 인공지능 서비스 '이루다'를 중심으로-)

  • Lee, Su-Ryeon;Choi, Eun-Jung
    • Journal of Digital Convergence
    • /
    • v.20 no.2
    • /
    • pp.269-275
    • /
    • 2022
  • As artificial intelligence services targeting humans increase, social demands are increasing that artificial intelligence should also be made on an ethical basis. Following this trend, the government and businesses are preparing policies and norms related to artificial intelligence ethics. In order to establish reasonable policies and norms, the first step is to understand the public's perceptions. In this paper, social data and news comments were collected and analyzed to understand the public's perception related to artificial intelligence and ethics. Interest analysis, emotional analysis, and discourse analysis were performed and visualized on the collected datasets. As a result of the analysis, interest in "artificial intelligence ethics" and "artificial intelligence" favorability showed an inversely proportional correlation. As a result of discourse analysis, the biggest issue was "personal information leakage," and it also showed a discourse on contamination and deflection of learning data and whether computer-made artificial intelligence should be given a legal personality. This study can be used as data to grasp the public's perception when preparing artificial intelligence ethical norms and policies.

Exploring the phenomenon of veganphobia in vegan food and vegan fashion (비건 음식과 비건 패션에서 나타난 비건포비아 현상에 대한 탐구)

  • Yeong-Hyeon Choi;Sangyung Lee
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.3
    • /
    • pp.381-397
    • /
    • 2024
  • This study investigates the negative perceptions (veganphobia) held by consumers toward vegan diets and fashion and aims to foster a genuine acceptance of ethical veganism in consumption. The textual data web-crawled Korean online posts, including news articles, blogs, forums, and tweets, containing keywords such as "contradiction," "dilemma," "conflict," "issues," "vegan food" and "vegan fashion" from 2013 to 2021. Data analysis was conducted through text mining, network analysis, and clustering analysis using Python and NodeXL programs. The analysis revealed distinct negative perceptions regarding vegan food. Key issues included the perception of hypocrisy among vegetarians, associations with specific political leanings, conflicts between environmental and animal rights, and contradictions between views on companion animals and livestock. Regarding the vegan fashion industry, the eco-friendliness of material selection and design processes were seen as the pivotal factors shaping negative attitudes. Furthermore, the study identified a shared negative perception regarding vegan food and vegan fashion. This negativity was characterized by confusion and conflicts between animal and environmental rights, biased perceptions linked to specific political affiliations, perceived self-righteousness among vegetarians, and general discomfort toward them. These factors collectively contributed to a broader negative perception of vegan consumption. In conclusion, this study is significant in understanding the complex perceptions and attitudes that con- sumers hold toward vegan food and fashion. The insights gained from this research can aid in the design of more effective campaign strategies aimed at promoting vegan consumerism, ultimately contributing to a more widespread acceptance of ethical veganism in society.