• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.025 seconds

Secure and Efficient Privacy-Preserving Identity-Based Batch Public Auditing with Proxy Processing

  • Zhao, Jining;Xu, Chunxiang;Chen, Kefei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.1043-1063
    • /
    • 2019
  • With delegating proxy to process data before outsourcing, data owners in restricted access could enjoy flexible and powerful cloud storage service for productivity, but still confront with data integrity breach. Identity-based data auditing as a critical technology, could address this security concern efficiently and eliminate complicated owners' public key certificates management issue. Recently, Yu et al. proposed an Identity-Based Public Auditing for Dynamic Outsourced Data with Proxy Processing (https://doi.org/10.3837/tiis.2017.10.019). It aims to offer identity-based, privacy-preserving and batch auditing for multiple owners' data on different clouds, while allowing proxy processing. In this article, we first demonstrate this scheme is insecure in the sense that malicious cloud could pass integrity auditing without original data. Additionally, clouds and owners are able to recover proxy's private key and thus impersonate it to forge tags for any data. Secondly, we propose an improved scheme with provable security in the random oracle model, to achieve desirable secure identity based privacy-preserving batch public auditing with proxy processing. Thirdly, based on theoretical analysis and performance simulation, our scheme shows better efficiency over existing identity-based auditing scheme with proxy processing on single owner and single cloud effort, which will benefit secure big data storage if extrapolating in real application.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

Real-time emotion analysis service with big data-based user face recognition (빅데이터 기반 사용자 얼굴인식을 통한 실시간 감성분석 서비스)

  • Kim, Jung-Ah;Park, Roy C.;Hwang, Gi-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.18 no.2
    • /
    • pp.49-54
    • /
    • 2017
  • In this paper, we use face database to detect human emotion in real time. Although human emotions are defined globally, real emotional perception comes from the subjective thoughts of the judging person. Therefore, judging human emotions using computer image processing technology requires high technology. In order to recognize the emotion, basically the human face must be detected accurately and the emotion should be recognized based on the detected face. In this paper, based on the Cohn-Kanade Database, one of the face databases, faces are detected by combining the detected faces with the database.

  • PDF

Nearest Neighbor-based Pre-processing Scheme for Advanced Skyline Query (최근접 이웃 탐색 기반의 향상된 스카이라인 질의를 위한 전처리 기법)

  • Kim, Ji-Hyun;Lee, SangMin;Jeon, Hyeongjun;Jin, ChangGyun;Kim, JiYunm;Kwon, Jin youngm;Kim, Jongwanm;Oh, Dukshinm
    • Annual Conference of KIPS
    • /
    • 2020.05a
    • /
    • pp.420-423
    • /
    • 2020
  • 스카이라인 질의는 객체의 속성을 기준으로 사용자의 선호에 적합한 대상을 탐색하는 기법이다. 기존 스카이라인 질의는 일괄처리 방식으로 탐색 결과를 반환하지만 대화형 앱이나 모바일 환경과 같이 잦은 위치이동 발생 시 일괄처리 방식으로 스카이라인 질의 결과를 신속하게 받기 어렵다. 최근접 이웃(Nearest Neighbor) 알고리즘은 사용자와 상호 작용이 필요한 대화형 앱에서 실시간으로 선호 객체를 탐색하여 사용자에게 전달함으로써 객체의 반환 속도를 향상시켰다. 그러나 최근접 이웃 알고리즘은 객체 탐색 과정에서 반복적인 비교 연산을 수행하여 불필요한 탐색 시간이 소요된다. 본 논문은 대화형 앱에서 신속한 스카이라인 결과를 산출하고자 연산 대상 객체의 범위를 축소함으로써 최근접 이웃 스카이라인 질의 알고리즘의 성능을 향상시킨 전처리 기법을 제안한다. 데이터 객체는 최대 40,000 개의 실험에서 제안 기법은 최근접 이웃 알고리즘보다 50% 빠른 성능을 나타내어 본 연구의 가용성이 증명되었다.

Efficient Top-k Query Processing Algorithm Using Grid Index-based View Selection Method (그리드 인덱스 기반 뷰 선택 기법을 이용한 효율적인 Top-k 질의처리 알고리즘)

  • Hong, Seungtae;Youn, Deulnyeok;Chang, Jae Woo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.76-81
    • /
    • 2015
  • Research on top-k query processing algorithms for analyzing big data have been spotlighted recently. However, because existing top-k query processing algorithms do not provide an efficient index structure, they incur high query processing costs and cannot support various types of queries. To solve these problems, we propose a top-k query processing algorithm using a view selection method based on a grid index. The proposed algorithm reduces the query processing time by retrieving the minimum number of grid cells for the query range, by using a grid index-based view selection method. Finally, we show from our performance analysis that the proposed scheme outperforms an existing scheme, in terms of both query processing time and query result accuracy.

Rhipe Platform for Big Data Processing and Analysis (빅데이터 처리 및 분석을 위한 Rhipe 플랫폼)

  • Jung, Byung Ho;Shin, Ji Eun;Lim, Dong Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1171-1185
    • /
    • 2014
  • Rhipe that integrates R and Hadoop environment, made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data and simulated data. Experimental results for comparing the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster, showed fully-distributed mode was more fast than pseudo-distributed mode and computing speeds of fully-distributed mode were faster as the number of data nodes increases. We also compared the performance of our Rhipe with stats and biglm packages available on bigmemory. The results showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Development of Big Data and AutoML Platforms for Smart Plants (스마트 플랜트를 위한 빅데이터 및 AutoML 플랫폼 개발)

  • Jin-Young Kang;Byeong-Seok Jeong
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.83-95
    • /
    • 2023
  • Big data analytics and AI play a critical role in the development of smart plants. This study presents a big data platform for plant data and an 'AutoML platform' for AI-based plant O&M(Operation and Maintenance). The big data platform collects, processes and stores large volumes of data generated in plants using Hadoop, Spark, and Kafka. The AutoML platform is a machine learning automation system aimed at constructing predictive models for equipment prognostics and process optimization in plants. The developed platforms configures a data pipeline considering compatibility with existing plant OISs(Operation Information Systems) and employs a web-based GUI to enhance both accessibility and convenience for users. Also, it has functions to load user-customizable modules into data processing and learning algorithms, which increases process flexibility. This paper demonstrates the operation of the platforms for a specific process of an oil company in Korea and presents an example of an effective data utilization platform for smart plants.

Analysis of Trend for BigData Processing Technology by DW Appliance (DW 어플라이언스를 통한 빅데이터 처리 기술 동향 분석)

  • Choi, Ro-Hwan;Park, Seok-Cheon;Sim, Bong-Soo
    • Annual Conference of KIPS
    • /
    • 2013.05a
    • /
    • pp.904-907
    • /
    • 2013
  • 최근 정보통신기술이 하루가 다르게 발전함에 따라 하루에도 수많은 데이터가 흘러나오는 최근의 추세이다. 정형 데이터 뿐 아니라 비정형 데이터 분석까지 진행하는 최근의 추세에 맞춰 현 빅데이터 기술 동향을 분석한다. 빅데이터 시대를 맞아 기존의 데이터웨어하우스(DW)와 발전된 데이터웨어하우스(DW) 어플라이언스에 대해 분석하고 향후 발전 전망과 방향을 제시한다.

Study on the Sensor Gateway for Receive the Real-Time Big Data in the IoT Environment (IoT 환경에서 실시간 빅 데이터 수신을 위한 센서 게이트웨이에 관한 연구)

  • Shin, Seung-Hyeok
    • Journal of Advanced Navigation Technology
    • /
    • v.19 no.5
    • /
    • pp.417-422
    • /
    • 2015
  • A service size of the IoT environment is determined by the number of sensors. The number of sensors increase means increases the amount of data generated by the IoT environment. There are studies to reliably operate a network for research and operational dynamic buffer for data when network congestion control congestion in the network environment. There are also studies of the stream data that has been processed in the connectionless network environment. In this study, we propose a sensor gateway for processing big data of the IoT environment. For this, review the RESTful for designing a sensor middleware, and apply the double-buffer algorithm to process the stream data efficiently. Finally, it generates a big data traffic using the MJpeg stream that is based on the HTTP protocol over TCP to evaluate the proposed system, with open source media player VLC using the image received and compare the throughput performance.

A Study on Building a Model for Safety Management of Small Buildings using Big Data (빅데이터를 활용한 소규모 건축물 안전관리 모델에 관한 연구)

  • Shin, Dongyoun
    • Journal of KIBIM
    • /
    • v.13 no.1
    • /
    • pp.13-21
    • /
    • 2023
  • The purpose of this study is to establish a system that manages the safety of buildings efficiently by finding the correlation of elements related to the safety of buildings and intuitively visualizing them. Data were collected using the data of small-scale buildings managed by public institutions and the government, and an effective analysis visualization environment was established through pre-processing. We selected safety-vulnerable factors such as the structure of the building and completion date to find the relationship, and established a model to prioritize management to find vulnerable buildings.