• Title/Summary/Keyword: Public Big data

Search Result 709, Processing Time 0.026 seconds

Secure and Efficient Privacy-Preserving Identity-Based Batch Public Auditing with Proxy Processing

  • Zhao, Jining;Xu, Chunxiang;Chen, Kefei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.1043-1063
    • /
    • 2019
  • With delegating proxy to process data before outsourcing, data owners in restricted access could enjoy flexible and powerful cloud storage service for productivity, but still confront with data integrity breach. Identity-based data auditing as a critical technology, could address this security concern efficiently and eliminate complicated owners' public key certificates management issue. Recently, Yu et al. proposed an Identity-Based Public Auditing for Dynamic Outsourced Data with Proxy Processing (https://doi.org/10.3837/tiis.2017.10.019). It aims to offer identity-based, privacy-preserving and batch auditing for multiple owners' data on different clouds, while allowing proxy processing. In this article, we first demonstrate this scheme is insecure in the sense that malicious cloud could pass integrity auditing without original data. Additionally, clouds and owners are able to recover proxy's private key and thus impersonate it to forge tags for any data. Secondly, we propose an improved scheme with provable security in the random oracle model, to achieve desirable secure identity based privacy-preserving batch public auditing with proxy processing. Thirdly, based on theoretical analysis and performance simulation, our scheme shows better efficiency over existing identity-based auditing scheme with proxy processing on single owner and single cloud effort, which will benefit secure big data storage if extrapolating in real application.

Automatic Switching of Clustering Methods based on Fuzzy Inference in Bibliographic Big Data Retrieval System

  • Zolkepli, Maslina;Dong, Fangyan;Hirota, Kaoru
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.256-267
    • /
    • 2014
  • An automatic switch among ensembles of clustering algorithms is proposed as a part of the bibliographic big data retrieval system by utilizing a fuzzy inference engine as a decision support tool to select the fastest performing clustering algorithm between fuzzy C-means (FCM) clustering, Newman-Girvan clustering, and the combination of both. It aims to realize the best clustering performance with the reduction of computational complexity from O($n^3$) to O(n). The automatic switch is developed by using fuzzy logic controller written in Java and accepts 3 inputs from each clustering result, i.e., number of clusters, number of vertices, and time taken to complete the clustering process. The experimental results on PC (Intel Core i5-3210M at 2.50 GHz) demonstrates that the combination of both clustering algorithms is selected as the best performing algorithm in 20 out of 27 cases with the highest percentage of 83.99%, completed in 161 seconds. The self-adapted FCM is selected as the best performing algorithm in 4 cases and the Newman-Girvan is selected in 3 cases.The automatic switch is to be incorporated into the bibliographic big data retrieval system that focuses on visualization of fuzzy relationship using hybrid approach combining FCM and Newman-Girvan algorithm, and is planning to be released to the public through the Internet.

Building Linked Big Data for Stroke in Korea: Linkage of Stroke Registry and National Health Insurance Claims Data

  • Kim, Tae Jung;Lee, Ji Sung;Kim, Ji-Woo;Oh, Mi Sun;Mo, Heejung;Lee, Chan-Hyuk;Jeong, Han-Young;Jung, Keun-Hwa;Lim, Jae-Sung;Ko, Sang-Bae;Yu, Kyung-Ho;Lee, Byung-Chul;Yoon, Byung-Woo
    • Journal of Korean Medical Science
    • /
    • v.33 no.53
    • /
    • pp.343.1-343.8
    • /
    • 2018
  • Background: Linkage of public healthcare data is useful in stroke research because patients may visit different sectors of the health system before, during, and after stroke. Therefore, we aimed to establish high-quality big data on stroke in Korea by linking acute stroke registry and national health claim databases. Methods: Acute stroke patients (n = 65,311) with claim data suitable for linkage were included in the Clinical Research Center for Stroke (CRCS) registry during 2006-2014. We linked the CRCS registry with national health claim databases in the Health Insurance Review and Assessment Service (HIRA). Linkage was performed using 6 common variables: birth date, gender, provider identification, receiving year and number, and statement serial number in the benefit claim statement. For matched records, linkage accuracy was evaluated using differences between hospital visiting date in the CRCS registry and the commencement date for health insurance care in HIRA. Results: Of 65,311 CRCS cases, 64,634 were matched to HIRA cases (match rate, 99.0%). The proportion of true matches was 94.4% (n = 61,017) in the matched data. Among true matches (mean age 66.4 years; men 58.4%), the median National Institutes of Health Stroke Scale score was 3 (interquartile range 1-7). When comparing baseline characteristics between true matches and false matches, no substantial difference was observed for any variable. Conclusion: We could establish big data on stroke by linking CRCS registry and HIRA records, using claims data without personal identifiers. We plan to conduct national stroke research and improve stroke care using the linked big database.

A Case Study of Basic Data Science Education using Public Big Data Collection and Spreadsheets for Teacher Education (교사교육을 위한 공공 빅데이터 수집 및 스프레드시트 활용 기초 데이터과학 교육 사례 연구)

  • Hur, Kyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.3
    • /
    • pp.459-469
    • /
    • 2021
  • In this paper, a case study of basic data science practice education for field teachers and pre-service teachers was studied. In this paper, for basic data science education, spreadsheet software was used as a data collection and analysis tool. After that, we trained on statistics for data processing, predictive hypothesis, and predictive model verification. In addition, an educational case for collecting and processing thousands of public big data and verifying the population prediction hypothesis and prediction model was proposed. A 34-hour, 17-week curriculum using a spreadsheet tool was presented with the contents of such basic education in data science. As a tool for data collection, processing, and analysis, unlike Python, spreadsheets do not have the burden of learning program- ming languages and data structures, and have the advantage of visually learning theories of processing and anal- ysis of qualitative and quantitative data. As a result of this educational case study, three predictive hypothesis test cases were presented and analyzed. First, quantitative public data were collected to verify the hypothesis of predicting the difference in the mean value for each group of the population. Second, by collecting qualitative public data, the hypothesis of predicting the association within the qualitative data of the population was verified. Third, by collecting quantitative public data, the regression prediction model was verified according to the hypothesis of correlation prediction within the quantitative data of the population. And through the satisfaction analysis of pre-service and field teachers, the effectiveness of this education case in data science education was analyzed.

A Case Study of Producing Infographics Using Tableau Public (Tableau Public을 이용한 인포그래픽 제작 사례연구)

  • Kim, Dong Hwan
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.21-29
    • /
    • 2015
  • Recently, according to the increasingly populated data, many media and organizations focus on big data, data visualization, information visualization and infographics. Domestically, Chosun.com and Hankyoreh online have improved on the data visualization field and internationally, the Guardian, Wall Street Journal, and New York Times are the leading companies on that area. Until now, many people have recognized infographics as a design-oriented product in Korea. However, one of significant data visualization programs, Tableau Public, can visualize data more efficiently. In this paper, Data Visualization Methods Quadrant for Policy Making is defined, and data analysis and producing infographics are executed. As used data, World Bank open source was adopted and using the number of passenger cars per 1,000 people, two analysis results are extracted. First, in high income group, the more GNI per capita, the lesser Slope is represented and in mid income group, the more GNI per capita positively affects to Slope. Second, in the global finance crisis, the car ownership rate was about 1.7 times than the usual state in the global economy. Through the case study, this paper suggests that the direction of producing infographics should be changed from design-oriented to data-oriented. Moreover, the data-oriented infographics should be propagated as means of scientific research and policy making.

Big Data Application for Judgment on Consumer's Awareness of the Trademark (상표의 소비자 인식 판단을 위한 빅데이터 활용 방안)

  • You, Hyun-Woo;Lee, Hwan-soo
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.6 no.8
    • /
    • pp.399-408
    • /
    • 2016
  • As entering the Big Data age, utilization of Big Data is also increasing in the intellectual property sector. Meanwhile, the purpose of a trademark which distinguishes the source of the goods essentially is to enable the public to recognize the goods. Big Data technologies which is recently becoming a issue can be used as a tool to judge consumer's awareness of the trademark. It was difficult for judgment of trademark awareness through traditional ways. As a new way, survey methodology has bee received attention, and it was applied to the field of trademark law. However, various problems such as cost, time, objectivity, and fairness were observed. In order to overcome theses limitations, this study proposes new way utilizing big data analytics for judgment on consumer's awareness of the trademark. This new way will not only contribute to enhancing the objectivity of judging trademark awareness but also utilized to support for related legal judgments.

The Detection Model of Disaster Issues based on the Risk Degree of Social Media Contents (소셜미디어 위험도기반 재난이슈 탐지모델)

  • Choi, Seon Hwa
    • Journal of the Korean Society of Safety
    • /
    • v.31 no.6
    • /
    • pp.121-128
    • /
    • 2016
  • Social Media transformed the mass media based information traffic, and it has become a key resource for finding value in enterprises and public institutions. Particularly, in regards to disaster management, the necessity for public participation policy development through the use of social media is emphasized. National Disaster Management Research Institute developed the Social Big Board, which is a system that monitors social Big Data in real time for purposes of implementing social media disaster management. Social Big Board collects a daily average of 36 million tweets in Korean in real time and automatically filters disaster safety related tweets. The filtered tweets are then automatically categorized into 71 disaster safety types. This real time tweet monitoring system provides various information and insights based on the tweets, such as disaster issues, tweet frequency by region, original tweets, etc. The purpose of using this system is to take advantage of the potential benefits of social media in relations to disaster management. It is a first step towards disaster management that communicates with the people that allows us to hear the voice of the people concerning disaster issues and also understand their emotions at the same time. In this paper, Korean language text mining based Social Big Board will be briefly introduced, and disaster issue detection model, which is key algorithms, will be described. Disaster issues are divided into two categories: potential issues, which refers to abnormal signs prior to disaster events, and occurrence issues, which is a notification of disaster events. The detection models of these two categories are defined and the performance of the models are compared and evaluated.

An Extraction Method of Sentiment Infromation from Unstructed Big Data on SNS (SNS상의 비정형 빅데이터로부터 감성정보 추출 기법)

  • Back, Bong-Hyun;Ha, Ilkyu;Ahn, ByoungChul
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.6
    • /
    • pp.671-680
    • /
    • 2014
  • Recently, with the remarkable increase of social network services, it is necessary to extract interesting information from lots of data about various individual opinions and preferences on SNS(Social Network Service). The sentiment information can be applied to various fields of society such as politics, public opinions, economics, personal services and entertainments. To extract sentiment information, it is necessary to use processing techniques that store a large amount of SNS data, extract meaningful data from them, and search the sentiment information. This paper proposes an efficient method to extract sentiment information from various unstructured big data on social networks using HDFS(Hadoop Distributed File System) platform and MapReduce functions. In experiments, the proposed method collects and stacks data steadily as the number of data is increased. When the proposed functions are applied to sentiment analysis, the system keeps load balancing and the analysis results are very close to the results of manual work.

The Perception of Gorpcore Look Using Big Data (빅 데이터를 활용한 고프코어 룩에 대한 인식)

  • Ji-Woo Kim;Jeong-Mee Kim
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.25 no.4
    • /
    • pp.77-92
    • /
    • 2023
  • The purpose of this study is to investigate the public perception of Gorpcore through Big Aata analytics. The study was conducted based on the collection of Big Data on the word 'Gorpcore' through Textom from July 24, 2017 to March 31, 2023. As a result, 63,386 words were collected from a total of 18,879 posts, and the top 50 words were determined based on frequency of appearance. Based on the collected words, centrality measures and CONCOR algorithm were performed in Ucinet 6. The research results are as follows. 1) The frequency of appearance was high in the order of 'Gorpcore look', 'fashion', 'coordination', 'clothes', 'outdoor', 'Musinsa', 'look', 'trend', 'brand' and 'ahjussi (middle-aged old man in Korean)'. These words had high TF-IDF scores, which leads to the conclusion that these are key words that are recognized as important. 2) Network centrality shows that 'Gorpcore look', 'fashion', 'outdoor', 'coordination', 'clothes', 'trend', 'look' and 'style' have a high correlation with other words. Through this, it was found that the public thinks it is important to create a variety of fashions by styling high-performance outdoor wear and casual wear, and that they are highly interested in clothes and in brands leading the Gorpcore trend. 3) As a result of the CONCOR algorithm, four significant groups were formed. The words that appear in each group are as follows. Group 1 - 'outdoor', 'Gorp', 'Normcore', 'hiking', 'functionality', 'new', 'sports', 'casual wear', 'activity', 'generation', 'collaboration'. Group 2 - 'fashion', 'trend', 'look', 'brand', 'style', 'shoes', 'ugly', 'item', 'trend', 'product', 'Salomon', 'padded jacket', 'stylishness', 'utilization', 'Winter', 'street', 'design', 'retro', 'popular', 'styling'. Group 3 - 'Gorpcore look', 'coordination', 'Musinsa', 'windbreaker', 'recommendation', 'Arcteryx', 'pants', 'man'. Group 4 - 'clothes' 'ahjussi', 'jacket', 'launching', 'spring', 'The North Face', 'collection', 'utility', 'jumper'. As a result, it can be seen that the Gorpcore is also regarded as a part of outdoor, fashion, coordination, and casual wear.

Sentiment analysis of nuclear energy-related articles and their comments on a portal site in Rep. of Korea in 2010-2019

  • Jeong, So Yun;Kim, Jae Wook;Kim, Young Seo;Joo, Han Young;Moon, Joo Hyun
    • Nuclear Engineering and Technology
    • /
    • v.53 no.3
    • /
    • pp.1013-1019
    • /
    • 2021
  • This paper reviewed the temporal changes in the public opinions on nuclear energy in Korea with a big data analysis of nuclear energy-related articles and their comments posted on the portal site NAVER. All articles that included at least one of "nuclear energy," "nuclear power plant (NPP)," "nuclear power phase-out," or "anti-nuclear" in their titles or main text were extracted from those posted on NAVER in January 2010-December 2019. First, we performed annual word frequency analysis to identify what words had appeared most frequently in the articles. For that period, the most frequent words were "NPP," "nuclear energy," and "energy." In addition, "safety" has remained in the upper ranks since the Fukushima NPP accident. Then, we performed sentiment analysis of the pre-processed articles. The sentiment analysis showed that positive-tone articles have been reported more frequently than negativetone over the entire analysis period. Last, we performed sentiment analysis of the comments on the articles to examine the public's intention regarding nuclear issues. The analysis showed that the number of negative comments to articles each month-irrespective of positive or negative tone-was always larger than that of positive comments over the entire analysis period.