• Title/Summary/Keyword: Hadoop MapReduce Framework

Search Result 46, Processing Time 0.03 seconds

Structural Change Detection Technique for RDF Data in MapReduce (맵리듀스에서의 구조적 RDF 데이터 변경 탐지 기법)

  • Lee, Taewhi;Im, Dong-Hyuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.293-298
    • /
    • 2014
  • Detecting and understanding the changes between RDF data is crucial in the evolutionary process, synchronization system, and versioning system on the web of data. However, current researches on detecting changes still remain unsatisfactory in that they did neither consider the large scale of RDF data nor accurately produce the RDF deltas. In this paper, we propose a scalable and effective change detection using a MapReduce framework which has been used in many fields to process and analyze large volumes of data. In particular, we focus on the structure-based change detection that adopts a strategy for the comparison of blank nodes in RDF data. To achieve this, we employ a method which is composed of two MapReduce jobs. First job partitions the triples with blank nodes by grouping each triple with the same blank node ID and then computes the incoming path to the blank node. Second job partitions the triples with the same path and matchs blank nodes with the Hungarian method. In experiments, we show that our approach is more accurate and effective than the previous approach.

Design and Implementation of Big Data Platform for Image Processing in Agriculture (농업 이미지 처리를 위한 빅테이터 플랫폼 설계 및 구현)

  • Nguyen, Van-Quyet;Nguyen, Sinh Ngoc;Vu, Duc Tiep;Kim, Kyungbaek
    • Annual Conference of KIPS
    • /
    • 2016.10a
    • /
    • pp.50-53
    • /
    • 2016
  • Image processing techniques play an increasingly important role in many aspects of our daily life. For example, it has been shown to improve agricultural productivity in a number of ways such as plant pest detecting or fruit grading. However, massive quantities of images generated in real-time through multi-devices such as remote sensors during monitoring plant growth lead to the challenges of big data. Meanwhile, most current image processing systems are designed for small-scale and local computation, and they do not scale well to handle big data problems with their large requirements for computational resources and storage. In this paper, we have proposed an IPABigData (Image Processing Algorithm BigData) platform which provides algorithms to support large-scale image processing in agriculture based on Hadoop framework. Hadoop provides a parallel computation model MapReduce and Hadoop distributed file system (HDFS) module. It can also handle parallel pipelines, which are frequently used in image processing. In our experiment, we show that our platform outperforms traditional system in a scenario of image segmentation.

Implementation of a Raspberry-Pi-Sensor Network (라즈베리파이 센서 네트워크 구현)

  • Moon, Sangook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.915-916
    • /
    • 2014
  • With the upcoming era of internet of things, the study of sensor network has been paid attention. Raspberry pi is a tiny versatile computer system which is able to act as a sensor node in hadoop cluster network. In this paper, we deployed 5 Raspberry pi's to construct an experimental testbed of hadoop sensor network with 5-node map-reduce hadoop software framework. We compared and analyzed the network architecture in terms of efficiency, resource management, and throughput using various parameters. We used a learning machine with support vector machine as test workload. In our experiments, Raspberry pi fulfilled the role of distributed computing sensor node in the sensor network.

  • PDF

A MapReduce-Based Workflow BIG-Log Clustering Technique (맵리듀스기반 워크플로우 빅-로그 클러스터링 기법)

  • Jin, Min-Hyuck;Kim, Kwanghoon Pio
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.87-96
    • /
    • 2019
  • In this paper, we propose a MapReduce-supported clustering technique for collecting and classifying distributed workflow enactment event logs as a preprocessing tool. Especially, we would call the distributed workflow enactment event logs as Workflow BIG-Logs, because they are satisfied with as well as well-fitted to the 5V properties of BIG-Data like Volume, Velocity, Variety, Veracity and Value. The clustering technique we develop in this paper is intentionally devised for the preprocessing phase of a specific workflow process mining and analysis algorithm based upon the workflow BIG-Logs. In other words, It uses the Map-Reduce framework as a Workflow BIG-Logs processing platform, it supports the IEEE XES standard data format, and it is eventually dedicated for the preprocessing phase of the ${\rho}$-Algorithm that is a typical workflow process mining algorithm based on the structured information control nets. More precisely, The Workflow BIG-Logs can be classified into two types: of activity-based clustering patterns and performer-based clustering patterns, and we try to implement an activity-based clustering pattern algorithm based upon the Map-Reduce framework. Finally, we try to verify the proposed clustering technique by carrying out an experimental study on the workflow enactment event log dataset released by the BPI Challenges.

A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku;Jeong, Changsung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.11
    • /
    • pp.2827-2848
    • /
    • 2012
  • Previously, we described a social media cloud computing service environment (SMCCSE). This SMCCSE supports the development of social networking services (SNSs) that include audio, image, and video formats. A social media cloud computing PaaS platform, a core component in a SMCCSE, processes large amounts of social media in a parallel and distributed manner for supporting a reliable SNS. Here, we propose a Hadoop-based multimedia system for image and video transcoding processing, necessary functions of our PaaS platform. Our system consists of two modules, including an image transcoding module and a video transcoding module. We also design and implement the system by using a MapReduce framework running on a Hadoop Distributed File System (HDFS) and the media processing libraries Xuggler and JAI. In this way, our system exponentially reduces the encoding time for transcoding large amounts of image and video files into specific formats depending on user-requested options (such as resolution, bit rate, and frame rate). In order to evaluate system performance, we measure the total image and video transcoding time for image and video data sets, respectively, under various experimental conditions. In addition, we compare the video transcoding performance of our cloud-based approach with that of the traditional frame-level parallel processing-based approach. Based on experiments performed on a 28-node cluster, the proposed Hadoop-based multimedia transcoding system delivers excellent speed and quality.

SPARQL Query Processing System over Scalable Triple Data using SparkSQL Framework (SparQLing : SparkSQL 기반 대용량 트리플 데이터를 위한 SPARQL 질의 시스템 구축)

  • Jeon, MyungJoong;Hong, JinYoung;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.4
    • /
    • pp.450-459
    • /
    • 2016
  • Every year, RDFS data tends further toward scalability; hence, the manner of SPARQL processing needs to be changed for fast query. The query processing method of SPARQL has been studied using a scalable distributed processing framework. Current studies indicate that the query engine based on the scalable distributed processing framework i.e., Hadoop(MapReduce) is not suitable for real-time processing because of the repetitive tasks; in addition, it is difficult to construct a query engine based on an In-memory Distributed Query engine, because distributed structure on the low-level is required to be considered. In this paper, we proposed a method to construct a query engine for improving the speed of the query process with the mass triple data. The query engine processes the query of SPARQL using the SparkSQL, which is an In-memory based, distributed query processing framework. SparkSQL is a high-level distributed query engine that facilitates existing SQL statement. In order to process the SPARQL query, after generating the Algebra Tree using Jena, the Algebra Tree is required to be translated to Spark Algebra Tree for application in the Spark system, and construction of the system that generated the SparkSQL query. Furthermore, we proposed the design of triple property table based on DataFrame for more efficient query processing in the Spark system. Finally, we verified the validity through comparative evaluation with the query engine, which is the existing distributed processing framework.

Big Data Preprocessing for Predicting Box Office Success (영화 흥행 실적 예측을 위한 빅데이터 전처리)

  • Jun, Hee-Gook;Hyun, Geun-Soo;Lim, Kyung-Bin;Lee, Woo-Hyun;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.615-622
    • /
    • 2014
  • The Korean film market has rapidly achieved an international scale, and this has led to a need for decision-making based on analytical methods that are more precise and appropriate. In this modern era, a highly advanced information environment can provide an overwhelming amount of data that is generated in real time, and this data must be properly handled and analyzed in order to extract useful information. In particular, the preprocessing of large data, which is the most time-consuming step, should be done in a reasonable amount of time. In this paper, we investigated a big data preprocessing method for predicting movie box office success. We analyzed the movie data characteristics for specialized preprocessing methods, and used the Hadoop MapReduce framework. The experimental results showed that the preprocessing methods using big data techniques are more effective than existing methods.

Comparison analysis of big data integration models (빅데이터 통합모형 비교분석)

  • Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.755-768
    • /
    • 2017
  • As Big Data becomes the core of the fourth industrial revolution, big data-based processing and analysis capabilities are expected to influence the company's future competitiveness. Comparative studies of RHadoop and RHIPE that integrate R and Hadoop environment, have not been discussed by many researchers although RHadoop and RHIPE have been discussed separately. In this paper, we constructed big data platforms such as RHadoop and RHIPE applicable to large scale data and implemented the machine learning algorithms such as multiple regression and logistic regression based on MapReduce framework. We conducted a study on performance and scalability with those implementations for various sample sizes of actual data and simulated data. The experiments demonstrated that our RHadoop and RHIPE can scale well and efficiently process large data sets on commodity hardware. We showed RHIPE is faster than RHadoop in almost all the data generally.

Distributed Support Vector Machines for Localization on a Sensor Newtork (센서 네트워크에서 위치 측정을 위한 분산 지지 벡터 머신)

  • Moon, Sangook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.944-946
    • /
    • 2014
  • Localization of a sensor network node using machine learning has been recently studied. It is easy for Support vector machines algorithm to implement in high level language enabling parallelism. In this paper, we realized Support vector machine using python language and built a sensor network cluster with 5 Pi's. We also established a Hadoop software framework to employ MapReduce mechanism. We modified the existing Support vector machine algorithm to fit into the distributed hadoop architecture system for localization of a sensor node. In our experiment, we implemented the test sensor network with a variety of parameters and examined based on proficiency, resource evaluation, and processing time.

  • PDF

Big IoT Healthcare Data Analytics Framework Based on Fog and Cloud Computing

  • Alshammari, Hamoud;El-Ghany, Sameh Abd;Shehab, Abdulaziz
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1238-1249
    • /
    • 2020
  • Throughout the world, aging populations and doctor shortages have helped drive the increasing demand for smart healthcare systems. Recently, these systems have benefited from the evolution of the Internet of Things (IoT), big data, and machine learning. However, these advances result in the generation of large amounts of data, making healthcare data analysis a major issue. These data have a number of complex properties such as high-dimensionality, irregularity, and sparsity, which makes efficient processing difficult to implement. These challenges are met by big data analytics. In this paper, we propose an innovative analytic framework for big healthcare data that are collected either from IoT wearable devices or from archived patient medical images. The proposed method would efficiently address the data heterogeneity problem using middleware between heterogeneous data sources and MapReduce Hadoop clusters. Furthermore, the proposed framework enables the use of both fog computing and cloud platforms to handle the problems faced through online and offline data processing, data storage, and data classification. Additionally, it guarantees robust and secure knowledge of patient medical data.