• Title/Summary/Keyword: HADOOP

Search Result 394, Processing Time 0.026 seconds

A Normal Network Behavior Profiling Method Based on Big Data Analysis Techniques (Hadoop/Hive) (빅데이터 분석 기술(Hadoop/Hive) 기반 네트워크 정상행위 규정 방법)

  • Kim, SungJin;Kim, Kangseok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.5
    • /
    • pp.1117-1127
    • /
    • 2017
  • With the advent of Internet of Things (IoT), the number of devices connected to Internet has rapidly increased, but the security for IoT is still vulnerable. It is difficult to integrate existing security technologies due to generating a large amount of traffic by using different protocols to use various IoT devices according to purposes and to operate in a low power environment. Therefore, in this paper, we propose a normal network behavior profiling method based on big data analysis techniques. The proposed method utilizes a Hadoop/Hive for Big Data analytics and an R for statistical computing. Also we verify the effectiveness of the proposed method through a simulation.

Analyzing Smart Grid Energy Data using Hadoop Based Big Data System (하둡기반 빅데이터 시스템을 이용한 스마트그리드 전력데이터 분석)

  • Cho, YoungTak;Lee, WonJin;Lee, Ingyu;On, Byung-Won;Choi, Jung-In
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.64 no.2
    • /
    • pp.85-91
    • /
    • 2015
  • With the increasing popularity of Smart Grid infrastructure, it is much easier to collect energy usage data using AMI (Advanced Measuring Instrument) from residential housing, buildings and factories. Several researches have been done to improve an energy efficiency by analyzing the collected energy usage data. However, it is not easy to store and analyze the energy data using a traditional relational database management system since the data size grows exponentially with an increasing popularity of Smart grid infrastructure. In this paper, we are proposing a Hadoop based Big data system to store and analyze energy usage data. Based on our limited experiments, Hadoop based energy data analysis is three times faster than that of a relational database management system based approach with the current system.

An Efficient Parallel Construction Scheme of An R-Tree using Hadoop (Hadoop을 이용한 R-트리의 효율적인 병렬 구축 기법)

  • Cong, Viet-Ngu Huynh;Kim, Jongmin;Kwon, Oh-Heum;Song, Ha-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.231-241
    • /
    • 2019
  • Bulk-loading an R-tree can be a good approach to build an efficient one. However, it takes a lot of time to bulk-load an R-tree for huge amount of data. In this paper, we propose a parallel R-tree construction scheme based on a Hadoop framework. The proposed scheme divides the data set into a number of partitions for which local R-trees are built in parallel via Map-Reduce operations. Then the local R-trees are merged into an global R-tree that covers the whole data set. While generating the partitions, it considers the spatial distribution of the data into account so that each partition has nearly equal amounts of data. Therefore, the proposed scheme gives an efficient index structure while reducing the construction time. Experimental tests show that the proposed scheme builds an R-tree more efficiently than the existing approaches.

A Study on implementation model for security log analysis system using Big Data platform (빅데이터 플랫폼을 이용한 보안로그 분석 시스템 구현 모델 연구)

  • Han, Ki-Hyoung;Jeong, Hyung-Jong;Lee, Doog-Sik;Chae, Myung-Hui;Yoon, Cheol-Hee;Noh, Kyoo-Sung
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.351-359
    • /
    • 2014
  • The log data generated by security equipment have been synthetically analyzed on the ESM(Enterprise Security Management) base so far, but due to its limitations of the capacity and processing performance, it is not suited for big data processing. Therefore the another way of technology on the big data platform is necessary. Big Data platform can achieve a large amount of data collection, storage, processing, retrieval, analysis, and visualization by using Hadoop Ecosystem. Currently ESM technology has developed in the way of SIEM (Security Information & Event Management) technology, and to implement security technology in SIEM way, Big Data platform technology is essential that can handle large log data which occurs in the current security devices. In this paper, we have a big data platform Hadoop Ecosystem technology for analyzing the security log for sure how to implement the system model is studied.

An Analysis of Utilization on Virtualized Computing Resource for Hadoop and HBase based Big Data Processing Applications (Hadoop과 HBase 기반의 빅 데이터 처리 응용을 위한 가상 컴퓨팅 자원 이용률 분석)

  • Cho, Nayun;Ku, Mino;Kim, Baul;Xuhua, Rui;Min, Dugki
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.4
    • /
    • pp.449-462
    • /
    • 2014
  • In big data era, there are a number of considerable parts in processing systems for capturing, storing, and analyzing stored or streaming data. Unlike traditional data handling systems, a big data processing system needs to concern the characteristics (format, velocity, and volume) of being handled data in the system. In this situation, virtualized computing platform is an emerging platform for handling big data effectively, since virtualization technology enables to manage computing resources dynamically and elastically with minimum efforts. In this paper, we analyze resource utilization of virtualized computing resources to discover suitable deployment models in Apache Hadoop and HBase-based big data processing environment. Consequently, Task Tracker service shows high CPU utilization and high Disk I/O overhead during MapReduce phases. Moreover, HRegion service indicates high network resource consumption for transfer the traffic data from DataNode to Task Tracker. DataNode shows high memory resource utilization and Disk I/O overhead for reading stored data.

A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku;Jeong, Changsung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.11
    • /
    • pp.2827-2848
    • /
    • 2012
  • Previously, we described a social media cloud computing service environment (SMCCSE). This SMCCSE supports the development of social networking services (SNSs) that include audio, image, and video formats. A social media cloud computing PaaS platform, a core component in a SMCCSE, processes large amounts of social media in a parallel and distributed manner for supporting a reliable SNS. Here, we propose a Hadoop-based multimedia system for image and video transcoding processing, necessary functions of our PaaS platform. Our system consists of two modules, including an image transcoding module and a video transcoding module. We also design and implement the system by using a MapReduce framework running on a Hadoop Distributed File System (HDFS) and the media processing libraries Xuggler and JAI. In this way, our system exponentially reduces the encoding time for transcoding large amounts of image and video files into specific formats depending on user-requested options (such as resolution, bit rate, and frame rate). In order to evaluate system performance, we measure the total image and video transcoding time for image and video data sets, respectively, under various experimental conditions. In addition, we compare the video transcoding performance of our cloud-based approach with that of the traditional frame-level parallel processing-based approach. Based on experiments performed on a 28-node cluster, the proposed Hadoop-based multimedia transcoding system delivers excellent speed and quality.

Anomaly Detection Technique of Log Data Using Hadoop Ecosystem (하둡 에코시스템을 활용한 로그 데이터의 이상 탐지 기법)

  • Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.128-133
    • /
    • 2017
  • In recent years, the number of systems for the analysis of large volumes of data is increasing. Hadoop, a representative big data system, stores and processes the large data in the distributed environment of multiple servers, where system-resource management is very important. The authors attempted to detect anomalies from the rapid changing of the log data that are collected from the multiple servers using simple but efficient anomaly-detection techniques. Accordingly, an Apache Hive storage architecture was designed to store the log data that were collected from the multiple servers in the Hadoop ecosystem. Also, three anomaly-detection techniques were designed based on the moving-average and 3-sigma concepts. It was finally confirmed that all three of the techniques detected the abnormal intervals correctly, while the weighted anomaly-detection technique is more precise than the basic techniques. These results show an excellent approach for the detection of log-data anomalies with the use of simple techniques in the Hadoop ecosystem.

Addressing Big Data solution enabled Connected Vehicle services using Hadoop (Hadoop을 이용한 스마트 자동차 서비스용 빅 데이터 솔루션 개발)

  • Nkenyereye, Lionel;Jang, Jong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.3
    • /
    • pp.607-612
    • /
    • 2015
  • As the amount of vehicle's diagnostics data increases, the actors in automotive ecosystem will encounter difficulties to perform a real time analysis in order to simulate or to design new services according to the data gathered from the connected cars. In this paper, we have conducted a study of a Big Data solution that expresses the essential deep analytics to process and analyze vast quantities of vehicles on board diagnostics data generated by cars. Hadoop and its ecosystems have been deployed to process a large data and delivered useful outcomes that may be used by actors in automotive ecosystem to deliver new services to car owners. As the Intelligent transport system is involved to guarantee safety, reduce rate of crash and injured in the accident due to speed, addressing big data solution based on vehicle diagnostics data is upcoming to monitor real time outcome from it and making collection of data from several connected cars, facilitating reliable processing and easier storage of data collected.

Analysis of the Influence Factors of Data Loading Performance Using Apache Sqoop (아파치 스쿱을 사용한 하둡의 데이터 적재 성능 영향 요인 분석)

  • Chen, Liu;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.77-82
    • /
    • 2015
  • Big Data technology has been attracted much attention in aspect of fast data processing. Research of practicing Big Data technology is also ongoing to process large-scale structured data much faster in Relatioinal Database(RDB). Although there are lots of studies about measuring analyzing performance, studies about structured data loading performance, prior step of analyzing, is very rare. Thus, in this study, structured data in RDB is tested the performance that loads distributed processing platform Hadoop using Apache sqoop. Also in order to analyze the influence factors of data loading, it is tested repeatedly with different options of data loading and compared with data loading performance among RDB based servers. Although data loading performance of Apache Sqoop in test environment was low, but in large-scale Hadoop cluster environment we can expect much better performance because of getting more hardware resources. It is expected to be based on study improving data loading performance and whole steps of performance analyzing structured data in Hadoop Platform.

Rhipe Platform for Big Data Processing and Analysis (빅데이터 처리 및 분석을 위한 Rhipe 플랫폼)

  • Jung, Byung Ho;Shin, Ji Eun;Lim, Dong Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1171-1185
    • /
    • 2014
  • Rhipe that integrates R and Hadoop environment, made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data and simulated data. Experimental results for comparing the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster, showed fully-distributed mode was more fast than pseudo-distributed mode and computing speeds of fully-distributed mode were faster as the number of data nodes increases. We also compared the performance of our Rhipe with stats and biglm packages available on bigmemory. The results showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.