• Title/Summary/Keyword: news data

Search Result 888, Processing Time 0.025 seconds

Optimistic Concurrency Control based on 2-Version and TimeStamp for Broadcast Environment : OCC/2VTS (방송환경에서 이중 버전과 타임스탬프에 기반을 둔 낙관적 동시성 제어 기법)

  • Lee, Uk-Hyun;Hwang, Bu-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.8D no.2
    • /
    • pp.132-144
    • /
    • 2001
  • The broadcast environment is asymmetric communication aspect that is typically much greater communication capacity available from server to clients than in the opposite direction. In addition, most of mobile computing systems only allow the generation of read-only transactions from mobile clients for retrieving different types of information such as stock data, traffic information and news updates. Since previous concurrency control protocols, however, do not consider such a particular characteristics, the performance degradation occurs when those schemes are applied to the broadcast environment having quite a high data contention. In this paper, we propose OCC/2VTS (Optimistic Concurrency Control based on 2-Version and TimeStamp) that is most appropriate for broadcast environment. OCC/2VTS lets each client process and commit query transactions for itself by using two version data in cache. If the values of appropriate data items are not changed twice by invalidation report after a query transaction starts, the query transaction is committed safely independent of commitment of update transactions. OCC/2VTS decreases the number of informing server for the purpose of commitment. Due to broadcasting the validation reports including updated recent values, it reduces the opportunity of requesting a recent data values of server as well. As a result, OCC/2VTS makes full use of the asymmetric bandwidth. It also improves transaction throughput by increasing the query transaction commit ratio as much as possible.

  • PDF

Korean Semantic Role Labeling Using Domain Adaptation Technique (도메인 적응 기술을 이용한 한국어 의미역 인식)

  • Lim, Soojong;Bae, Yongjin;Kim, Hyunki;Ra, Dongyul
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.475-482
    • /
    • 2015
  • Developing a high-performance Semantic Role Labeling (SRL) system for a domain requires manually annotated training data of large size in the same domain. However, such SRL training data of sufficient size is available only for a few domains. Performances of Korean SRL are degraded by almost 15% or more, when it is directly applied to another domain with relatively small training data. This paper proposes two techniques to minimize performance degradation in the domain transfer. First, a domain adaptation algorithm for Korean SRL is proposed which is based on the prior model that is one of domain adaptation paradigms. Secondly, we proposed to use simplified features related to morphological and syntactic tags, when using small-sized target domain data to suppress the problem of data sparseness. Other domain adaptation techniques were experimentally compared to our techniques in this paper, where news and Wikipedia were used as the sources and target domains, respectively. It was observed that the highest performance is achieved when our two techniques were applied together. In our system's performance, F1 score of 64.3% was considered to be 2.4~3.1% higher than the methods from other research.

Analysis on Media Reports of the 「Security Services Industry Act」 Using News Big Data -Focusing on the Period from 1990 to 2021-

  • Cho, Cheol-Kyu;Park, Su-Hyeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.5
    • /
    • pp.199-204
    • /
    • 2022
  • The purpose of this study is to broaden the understanding of the Security Services Industry Act, and also to examine the meanings of various phenomena by analyzing the media report big data rather than the researchers' perspective on the Security Services Industry Act. In the research method, this study searched for a keyword 「Security Services Industry Act」 that prescribes the security work as an important subject of crime prevention and maintenance of public order in Korea. The data was searched from 1990 to 2021 the BIG KINDS could provide. Also, for the concrete analysis during the period of data search, it was divided into settlement period(1976~2001), growth period-quantitative(2002~2012), and growth period-qualitative(2013~2021). In the results of this study, the media report perception of the Security Services Industry Act is continuously emphasizing the social roles and importance of private security according to the flow of time. The consequent marketability of private security will play great roles in the protection of people's lives and properties in the combination with various other industries in the future. However, the private security industry that provides public peace service together with the police, could be rising as an element that hinders the development of private security industry because of various social issues caused by legal regulations and illegal problems, so it would be necessary to more strengthen its responsibility and roles accordingly.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

A study on the Domestic Consumer's Perception of "Hansik" with Big Data Analysis : Using Text Mining and Semantic Network Analysis (빅데이터를 통한 내국인의 '한식' 인식 연구 : 텍스트마이닝과 의미연결망 중심으로)

  • Park, Kyeong-Won;Yun, Hee-Kyoung
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.6
    • /
    • pp.145-151
    • /
    • 2020
  • 'Hansik', or Korean cuisine is one of Korea national brands. To understand the domestic consumer awareness of Korean cuisine, data was gathered under the keyword search, 'Hansik.' Textom 3.5 was used to gather data from blogs, news media found on Naver from November 1, 2018, to October 31, 2019. The results from frequency and TF-IDF analysis indicate that the 'buffet' had the largest proportion in terms of consumer awareness to Hansik. Also, broadcasting contents starring star chefs had a great influence. The Hansik awareness did not remain in the domains of its traditionality, but also branched into extents into areas such as fusional and gourmet cuisine. UCINET6 and NetDraw were used to conduct CONCOR analysis. Four cluster formations have been found; various food cultural cluster, high-end restaurant cluster referring to aired restaurants on media, Hansik brand cluster, and Hansik buffet cluster. This study proposes presenting a various menu of Hansik which use a multiple number of ingredients. Also, a promotion that introduces fine Hansik and a development of marketing views and media contents about the convenient HMRs make the associated imagery of Hansik to be strengthen.

Multivariate Analysis of Factors for Search on Suicide Using Social Big Data (소셜 빅 데이터를 활용한 자살검색 요인 다변량 분석)

  • Song, Tae Min;Song, Juyoung;An, Ji-Young;Jin, Dallae
    • Korean Journal of Health Education and Promotion
    • /
    • v.30 no.3
    • /
    • pp.59-73
    • /
    • 2013
  • Objectives: The study is aimed at examining the individual reasons and regional/environmental factors of online search on suicide using social big data to predict practical behaviors related to suicide and to develop an online suicide prevention system on the governmental level. Methods: The study was conducted using suicide-related social big data collected from online news sites, blogs, caf$\acute{e}$s, social network services and message boards between January 1 and December 31, 2011 (321,506 buzzes from users assumed as adults and 67,742 buzzes from those assumed as teenagers). Technical analysis and development of the suicide search prediction model were done using SPSS 20.0, and the structural model, nd multi-group analysis was made using AMOS 20.0. Also, HLM 7.0 was applied for the multilevel model analysis of the determinants of search on suicide by teenagers. Results: A summary of the results of multivariate analysis is as follows. First, search on suicide by adults appeared to increase on days when there were higher number of suicide incidents, higher number of search on drinking, higher divorce rate, lower birth rate and higher average humidity. Second, search on suicide by teenagers rose on days when there were higher number of teenage suicide incidents, higher number of search on stress or drinking and less fine dust particles. Third, the comparison of the results of the structural equation model analysis of search on suicide by adults and teenagers showed that teenagers were more likely to proceed from search on stress to search on sports, drinking and suicide, while adults significantly tended to move from search on drinking to search on suicide. Fourth, the result of the multilevel model analysis of determinants of search on suicide by teenagers showed that monthly teenagers suicide rate and average humidity had positive effect on the amount of search on suicide. Conclusions: The study shows that both adults and teenagers are influenced by various reasons to experience stress and search on suicide on the Internet. Therefore, we need to develop diverse school-level programs that can help relieve teenagers of stress and workplace-level programs to get rid of the work-related stress of adults.

Analysis of the Yearbook from the Korea Meteorological Administration using a text-mining agorithm (텍스트 마이닝 알고리즘을 이용한 기상청 기상연감 자료 분석)

  • Sun, Hyunseok;Lim, Changwon;Lee, YungSeop
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.603-613
    • /
    • 2017
  • Many people have recently posted about personal interests on social media. The development of the Internet and computer technology has enabled the storage of digital forms of documents that has resulted in an explosion of the amount of textual data generated; subsequently there is an increased demand for technology to create valuable information from a large number of documents. A text mining technique is often used since text-based data is mostly composed of unstructured forms that are not suitable for the application of statistical analysis or data mining techniques. This study analyzed the Meteorological Yearbook data of the Korea Meteorological Administration (KMA) with a text mining technique. First, a term dictionary was constructed through preprocessing and a term-document matrix was generated. This term dictionary was then used to calculate the annual frequency of term, and observe the change in relative frequency for frequently appearing words. We also used regression analysis to identify terms with increasing and decreasing trends. We analyzed the trends in the Meteorological Yearbook of the KMA and analyzed trends of weather related news, weather status, and status of work trends that the KMA focused on. This study is to provide useful information that can help analyze and improve the meteorological services and reflect meteorological policy.

Social Factors Affecting Internet Searches on Cyber Bullying in Korea and America Using Social Big Data and Google Search Trends (소셜 빅데이터와 Google 검색트렌드를 활용한 한국과 미국의 사이버불링 검색에 영향을 미치는 요인 분석)

  • Song, Tae-Min;Song, Juyoung;Cheon, Mi-Kyung
    • The Journal of Bigdata
    • /
    • v.1 no.1
    • /
    • pp.67-75
    • /
    • 2016
  • The study analyzed big data extracted from Google and social media to identify factors related to searches on cyber bullying in Korea and America. Korea's cyber bullying analysis was conducted social big data collected from online news sites, blogs, $caf{\acute{e}}s$, social network services and message for between January 1, 2011 and March 31, 2013. Google search trends for the search words of stress, exercise, drinking, and cyber bullying were obtained for January 1, 2004 and December 22, 2013. The main results of this study were as follows: first, the significant factors stress were cyber bullying that Korea more than America. Secondly, a positive relationship was found between stress and drinking, exercise and cyber bullying both Korea and America. Thirdly, significant differences were found all path both Korea and America. The study shows that both adults and teenagers are influenced in Korea. We need to develop online application that if cyber bullying behavior was predicted can intervene in real time because these actual cyber bullying-related exposure to psychological and behavioral characteristic.

  • PDF

Analysis of related words for each private security service through collection of unstructured data

  • Park, Su-Hyeon;Cho, Cheol-Kyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.6
    • /
    • pp.219-224
    • /
    • 2020
  • The purpose of this study is mainly to provide theoretical basis of private security industry by analyzing the perception and flow of private security from the press-released materials according to periodic classification and duties through 'Big Kinds', a website of analyzing news big data. The research method has been changed to structured data to allow an analysis of various scattered unstructured data, and the keywords trend and related words by duties of private security were analyzed in growth period of private security. The perception of private security based on the results of the study was exposed a lot by the media through various crimes, accidents and incidents, and the issues related permanent position. Also, it tended to be perceived as a simple security guard, not recognized as the area of private security, and judging from the high correlation between private security and police, it was recognized not only as a role to assist the police force, but also as a common agent in charge of the public peace. Therefore, it should objectively judge the perception of private security, and through this, it is believed that it should be a foundation for recognizing private security as a main agent responsible for the safety of the nation and maintaining social orders.

Social Big Data-based Co-occurrence Analysis of the Main Person's Characteristics and the Issues in the 2016 Rio Olympics Men's Soccer Games (소셜 빅데이터 기반 2016리우올림픽 축구 관련 이슈 및 인물에 대한 연관단어 분석)

  • Park, SungGeon;Lee, Soowon;Hwang, YoungChan
    • 한국체육학회지인문사회과학편
    • /
    • v.56 no.2
    • /
    • pp.303-320
    • /
    • 2017
  • This paper seeks to better understand the focal issues and persons related to Rio Olympic soccer games through social data science and analytics. This study collected its data from online news articles and comments specific to KOR during the Olympic football games. In order to investigate the public interests for each game and target persons, this study performed the co-occurrence words analysis. Then after, the study applied the NodeXL software to perform its visualization of the results. Through this application and process, the study found several major issues during the Rio Olympic men's football game including the following: the match between KOR and PIJ, KOR player Heungmin Son, commentator Young-Pyo Lee, sportscaster Woo-Jong Jo. The study also showed the general public opinion expressed positive words towards the South Korean national football team during the Rio Olympics, though there existed negative words as well. Furthermore the study revealed positive attitude towards the commentators and casters. In conclusion, the way to increase the public's interest in big sporting events can be achieved by providing the following: contents that include various professional sports analysis, a capable domain expert with thorough preparation, a commentator and/or caster with artistic sense as well as well-spoken, explanatory power and so on. Multidisciplinary research combined with sports science, social science, information technology and media can contribute to a wide range of theoretical studies and practical developments within the sports industry.