• Title/Summary/Keyword: news data

Search Result 888, Processing Time 0.025 seconds

Drought evaluation using unstructured data: a case study for Boryeong area (비정형 데이터를 활용한 가뭄평가 - 보령지역을 중심으로 -)

  • Jung, Jinhong;Park, Dong-Hyeok;Ahn, Jaehyun
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.12
    • /
    • pp.1203-1210
    • /
    • 2020
  • Drought is caused by a combination of various hydrological or meteorological factor, so it is difficult to accurately assess drought event, but various drought indices have been developed to interpret them quantitatively. However, the drought indexes currently being used are calculated from the lack of a single variable, which is a problem that does not accurately determine the drought event caused by complex causes. Shortage of a single variable may not be a drought, but it is judged to be a drought. On the other hand, research on developing indices using unstructured data, which is widely used in big data analysis, is being carried out in other fields and proven to be superior. Therefore, in this study, we intend to calculate the drought index by combining unstructured data (news data) with weather and hydrologic information (rainfall and dam inflow) that are being used for the existing drought index, and to evaluate the utilization of drought interpretation through verification of the calculated drought index. The Clayton Copula function was used to calculate the joint drought index, and the parameter estimation was used by the calibration method. The analysis showed that the drought index, which combines unstructured data, properly expresses the drought period compared to the existing drought index (SPI, SDI). In addition, ROC scores were calculated higher than existing drought indices, making them more useful in drought interpretation. The joint drought index calculated in this study is considered highly useful in that it complements the analytical limits of the existing single variable drought index and provides excellent utilization of the drought index using unstructured data.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Intelligent Web Crawler for Supporting Big Data Analysis Services (빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러)

  • Seo, Dongmin;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.575-584
    • /
    • 2013
  • Data types used for big-data analysis are very widely, such as news, blog, SNS, papers, patents, sensed data, and etc. Particularly, the utilization of web documents offering reliable data in real time is increasing gradually. And web crawlers that collect web documents automatically have grown in importance because big-data is being used in many different fields and web data are growing exponentially every year. However, existing web crawlers can't collect whole web documents in a web site because existing web crawlers collect web documents with only URLs included in web documents collected in some web sites. Also, existing web crawlers can collect web documents collected by other web crawlers already because information about web documents collected in each web crawler isn't efficiently managed between web crawlers. Therefore, this paper proposed a distributed web crawler. To resolve the problems of existing web crawler, the proposed web crawler collects web documents by RSS of each web site and Google search API. And the web crawler provides fast crawling performance by a client-server model based on RMI and NIO that minimize network traffic. Furthermore, the web crawler extracts core content from a web document by a keyword similarity comparison on tags included in a web documents. Finally, to verify the superiority of our web crawler, we compare our web crawler with existing web crawlers in various experiments.

The Effects of Psychological Contract Violation on OS User's Betrayal Behaviors: Window XP Technical Support Ending Case (심리적 계약 위반이 OS이용자의 배신 행동에 미치는 영향: 윈도우 XP 기술적 지원서비스 중단 사례)

  • Lee, Un-Kon
    • Asia pacific journal of information systems
    • /
    • v.24 no.3
    • /
    • pp.325-344
    • /
    • 2014
  • Technical support of Window XP ended in March, 8, 2014, and it makes OS(Operating System) users fall in a state of confusion. Sudden decision making of OS upgrade and replacement is not a simple problem. Firms need to change the long term capacity plan in enterprise IS management, but they are pressed for time and cost to complete it. Individuals can not help selecting the second best plan, because the following OSs of Window XP are below expectations in performances, new PC sales as the opportunities of OS upgrade decrease, and the potential risk of OS technical support ending had not announced to OS users at the point of purchase. Microsoft as the OS vendors had not presented precaution or remedy for this confusion. Rather, Microsoft announced that the technical support of the other following OSs of Wndow XP such as Window 7 would ended in two years. This conflict between OS vendor and OS users could not happen in one time, but could recur in recent future. Although studies on the ways of OS user protection policy would be needed to escape from this conflict, few prior studies had conducted this issue. This study had challenge to cautiously investigate in such OS user's reactions as the confirmation with OS user's expectation in the point of purchase, three types of justice perception on the treatment of OS vendor, psychological contract violation, satisfaction and the other betrayal behavioral intention in the case of Window XP technical support ending. By adopting the justice perception on this research, and by empirically validating the impact on OS user's reactions, I could suggest the direction of establishing OS user protection policy of OS vendor. Based on the expectation-confirmation theory, the theory of justice, literatures about psychological contract violation, and studies about consumer betrayal behaviors in the perspective of Herzberg(1968)'s dual factor theory, I developed the research model and hypothesis. Expectation-confirmation theory explain that consumers had expectation on the performance of product in the point of sale, and they could satisfied with their purchase behaviors, when the expectation could have confirmed in the point of consumption. The theory of justice in social exchange argues that treatee could be willing to accept the treatment by treater when the three types of justice as distributive, procedural, and interactional justice could be established in treatment. Literatures about psychological contract violation in human behaviors explains that contracter in a side could have the implied contract (also called 'psychological contract') which the contracter in the other side would sincerely execute the contract, and that they are willing to do vengeance behaviors when their contract had unfairly been broken. When the psychological contract of consumers had been broken, consumers feel distrust with the vendors and are willing to decrease such beneficial attitude and behavior as satisfaction, loyalty and repurchase intention. At the same time, consumers feel betrayal and are willing to increase such retributive attitude and behavior as negative word-of-mouth, complain to the vendors, complain to the third parties for consumer protection. We conducted a scenario survey in order to validate our research model at March, 2013, when is the point of news released firstly and when is the point of one year before the acture Window XP technical support ending. We collected the valid data from 238 voluntary participants who are the OS users but had not yet exposed the news of Window OSs technical support ending schedule. The subject had been allocated into two groups and one of two groups had been exposed this news. The data had been analyzed by the MANOVA and PLS. MANOVA results indicate that the OSs technical support ending could significantly decrease all three types of justice perception. PLS results indicated that it could significantly increase psychological contract violation and that this increased psychological contract violation could significantly reduce the trust and increase the perceived betrayal. Then, it could significantly reduce satisfaction, loyalty, and repurchase intention, and it also could significantly increase negative word-of-month intention, complain to the vendor intention, and complain to the third party intention. All hypothesis had been significantly approved. Consequently, OS users feel that the OSs technical support ending is not natural value added service ending, but the violation of the core OS purchase contract, that it could be the posteriori prohibition of OS user's OS usage right, and that it could induce the psychological contract violation of OS users. This study would contributions to introduce the psychological contract violation of the OS users from the OSs technical support ending in IS field, to introduce three types of justice as the antecedents of psychological contract violation, and to empirically validate the impact of psychological contract violation both on the beneficial and retributive behavioral intentions of OS users. For practice, the results of this study could contribute to make more comprehensive OS user protection policy and consumer relationship management practices of OS vendor.

Regional Identity and Community Paper: A Search for Subject and Method of Geographical Research (지역정체성 연구와 지역신문의 활용 -지리학적 연구주제의 탐색-)

  • Lee, Young-Min
    • Journal of the Korean association of regional geographers
    • /
    • v.5 no.2
    • /
    • pp.1-14
    • /
    • 1999
  • In the course of modernization and globalization, each region in Korea has experienced deep subordination to the center of Seoul and the increase of colonization possibility by world capital. In order to overcome the current situation, above all, the strategies should be developed focusing on daily life and life space. The basis for the development of strategies is the establishment of regional identity on life space. It is because of the reason that life space, or small-scale region has drawn wide attention in the research of geography in recent years. Especially, humanistic geography and new regional geography have developed the concerning theory and methodology, and kept going on the research of small-scale regions. Generally speaking, there have been quite large amount of theoretical discussions on small-scale region in recent years in the field of geography. Empirical researches focusing on a particular small-scale region, however, have been rarely made. It seems related to the deficiency of data materials and the obscurity of research framework of small-scale regional geography. A community paper must be very helpful for the geographic research on small-scale region. As community paper is published based on county('gun'), small or mid-size city('si'), or district of large city('gu'), it deals with small news and daily life information closely attached to the region. Accordingly, it functions as a medium of the formation of regional identity. It is also a valuable source material for the validation of regional identity and for the analysis of identity-shaping mechanism. The geographic interests in community paper, first of all, should be taken shape by the work on the geographical distribution and the periodic change of publication situation of community papers in Korea. Another research subject on community paper is the examination of characteristics of the region by analyzing the news and the advertisements. The news in community paper must be a valuable data source of regional studies in geography. Also, the regional identification process of community people through the community paper could be and should be explored, and how the regional centrality, or self-generation based on the identity is achieved will be an important subject.

  • PDF

Curation Service to Improve User's Access to National R & D Information : Focusing on Issues R&D Service (사용자의 국가 R&D 정보 이용 접근성 향상을 위한 큐레이션 서비스 : 이슈로 보는 R&D 사례를 중심으로)

  • Yu, Eun-ji;Choi, Kwang-Nam;Hwang, Youna
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.9
    • /
    • pp.1-10
    • /
    • 2020
  • National R & D data covers information in all fields from basic science research to industrialization, but it is expressed in technical terms, which make it difficult for the public to use. Accordingly, NTIS developed and launched the data curation service 'R&D issue service', which selects national R&D information on national and social issues and provides them to the public. Therefore, this study aims to analyze the effect of a data curation service on NTIS users' access to R&D data and suggest how to develop the curation service. The R&D issue service extracts issue from the news article and provide related national R&D projects, achievements and major research institute. All raw data used for the service are open to the public, organized in a report format and provided as PDF files. In addition, automative process is developed for all NTIS users to make individual issue packaging like administrator. The results show that 'R&D issue service' launching increases users' access and convenience to R&D data related to major issues, and the number of page views of users increased after the service was opened.

Prediction of Onion Purchase Using Structured and Unstructured Big Data (정형 및 비정형 빅데이터를 이용한 양파 소비 예측)

  • Rah, HyungChul;Oh, Eunhwa;Yoo, Do-il;Cho, Wan-Sup;Nasridinov, Aziz;Park, Sungho;Cho, Youngbeen;Yoo, Kwan-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.30-37
    • /
    • 2018
  • The social media data and the broadcasting data related to onion as well as agri-food consumer panel data were collected and investigated if the amount of money spent to purchase onion in year 2014 when onion price plunged latest were correlated with the frequencies of onion-related keywords in the social media data and the broadcasting programs because onion price in year 2018 is expected to plunge due to overproduction and there has been needs to analyze impacts of social media and broadcasting program on onion purchase in the previous similar events, and identify potential factors that can promote onion consumption in advance. What we identified from our study include a) broadcasting news programs mentioning words "onion," were correlated with onion purchase with 3 - 6 weeks in advance; b) broadcasting entertainment programs mentioning words "onion and health," were correlated with onion purchase with 11 weeks in advance; c) blog mentioning words "onion and efficacy," were correlated with onion purchase with 5 weeks in advance. Our study provided a case on how social media and broadcasting programs could be analyzed for their effects on consumer purchase behavior using big data collection and analysis in the field of agriculture. We propose to use the findings from the study may be applied to promote onion consumption.

A study on the User Experience at Unmanned Checkout Counter Using Big Data Analysis (빅데이터 분석을 통한 무인계산대 사용자 경험에 관한 연구)

  • Kim, Ae-sook;Jung, Sun-mi;Ryu, Gi-hwan;Kim, Hee-young
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.343-348
    • /
    • 2022
  • This study aims to analyze the user experience of unmanned checkout counters perceived by consumers using SNS big data. For this study, blogs, news, intellectuals, cafes, intellectuals (tips), and web documents were analyzed on Naver and Daum, and 'unmanned checkpoints' were used as keywords for data search. The data analysis period was selected as two years from January 1, 2020 to December 31, 2021. For data collection and analysis, frequency and matrix data were extracted through Textom, and network analysis and visualization analysis were conducted using the NetDraw function of the UCINET 6 program. As a result, the perception of the checkout counter was clustered into accessibility, usability, continuous use intention, and others according to the definition of consumers' experience factors. From a supplier's point of view, if unmanned checkpoints spread indiscriminately to solve the problem of raising the minimum wage and shortening working hours, a bigger employment problem will arise from a social point of view. In addition, institutionalization is needed to supply easy and convenient unmanned checkout counters for the elderly and younger generations, children, and foreigners who are not familiar with unmanned calculation.

Social Perception of the Invention Education Center as seen in Big Data (빅데이터 분석을 통한 발명 교육 센터에 대한 사회적 인식)

  • Lee, Eun-Sang
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.71-80
    • /
    • 2022
  • The purpose of this study is to analyze the social perception of invention education center using big data analysis method. For this purpose, data from January 2014 to September 2021 were collected using the Textom website as a keyword searched for 'invention+education+center' in blogs, cafes, and news channels of NAVER and DAUM website. The collected data was refined using the Textom website, and text mining analysis and semantic network analysis were performed by the Textom website, Ucinet 6, and Netdraw programs. The collected data were subjected to a primary and secondary refinement process and 60 keywords were selected based on the word frequency. The selected key words were converted into matrix data and analyzed by semantic network analysis. As a result of text mining analysis, it was confirmed that 'student', 'operation', 'Korea Invention Promotion Association', and 'Korean Intellectual Property Office' were the meaningful keywords. As a result of semantic network analysis, five clusters could be identified: 'educational operation', 'invention contest', 'education process and progress', 'recruitment and support for business', and 'supervision and selection institution'. Through this study, it was possible to confirm various meaningful social perceptions of the general public in relation to invention education center on the internet. The results of this study will be used as basic data that provides meaningful implications for researchers and policy makers studying for invention education.