• Title/Summary/Keyword: HADOOP

Search Result 392, Processing Time 0.028 seconds

Data Analysis of Car Sensor System using Hadoop Framework (Hadoop을 이용한 자동차 센서 데이터 분석 기법 연구)

  • Yoon, Jae-Yeol;Lim, Ji-Yeon;Kim, Iee-Joon;Kim, Ung-Mo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.216-219
    • /
    • 2012
  • 최근 정보의 다양화와 대량화로 인해 대두된 빅데이터 환경은 여러 분야로의 연구 방향을 제시하고 있다. 이미 데이터 양이 페타바이트 규모를 넘어서고 있으며, 이를 처리하기 위한 방안이 연구중이다. 본 논문에서는 무선 통신기기 및 센서 기술의 발달로 대규모 네트워크 구축이 가능해진 센서 데이터 중 차량에 사용되는 센서 데이터를 연구하는 방안을 제시하고자 한다. 빅데이터 개념이 대두되면서 이슈화되고 있는 Hadoop 시스템을 이용하여 자동차 센서 데이터 (CAN Message)를 분석하고자 한다.

Design an Indexing Structure System Based on Apache Hadoop in Wireless Sensor Network

  • Keo, Kongkea;Chung, Yeongjee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.45-48
    • /
    • 2013
  • In this paper, we proposed an Indexing Structure System (ISS) based on Apache Hadoop in Wireless Sensor Network (WSN). Nowadays sensors data continuously keep growing that need to control. Data constantly update in order to provide the newest information to users. While data keep growing, data retrieving and storing are face some challenges. So by using the ISS, we can maximize processing quality and minimize data retrieving time. In order to design ISS, Indexing Types have to be defined depend on each sensor type. After identifying, each sensor goes through the Indexing Structure Processing (ISP) in order to be indexed. After ISP, indexed data are streaming and storing in Hadoop Distributed File System (HDFS) across a number of separate machines. Indexed data are split and run by MapReduce tasks. Data are sorted and grouped depend on sensor data object categories. Thus, while users send the requests, all the queries will be filter from sensor data object and managing the task by MapReduce processing framework.

A File Merging Scheme for Efficient Handling of Small Files in Hadoop Distributed File System (Hadoop Distribute file system에서 Small file을 효과적으로 처리하기 위한 파일 병합 기법 연구)

  • Park, Jong-Chang;Youn, Hee-Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.15-17
    • /
    • 2013
  • HDFS(Hadoop Distribute File System)는 대용량 파일 처리를 목적으로 설계 되었으며 현재 이상적인 분산 파일 시스템으로 각광 받고 있다. 이러한 HDFS는 기존 분산파일 시스템과 많은 유사성을 가지고 있으나, Fault Tolerance를 제공하고, 데이터 엑세스 패턴을 스트리밍 방식으로 지원하여 대용량 파일을 효율적으로 저장할 수 있다는 차별성을 가지고 있다. 하지만 실제 HDFS 데이터 집합에는 Small file이 차지하는 비중이 상당히 높으며, 이러한 다수의 Small file 은 데이터 처리에 있어 높은 비용을 초래할 뿐 아니라 Master Node 의 파일 처리 및 메모리 성능에 악영향을 미친다. 따라서 본 논문에서는 HDFS에서 Small file 이 미치는 영향을 분석하고 이러한 문제점을 해결 할 수 있는 로컬 인덱스 파일기반의 파일 병합 기법을 제안한다.

Spatial Computation on Spark Using GPGPU (GPGPU를 활용한 스파크 기반 공간 연산)

  • Son, Chanseung;Kim, Daehee;Park, Neungsoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.8
    • /
    • pp.181-188
    • /
    • 2016
  • Recently, as the amount of spatial information increases, an interest in the study of spatial information processing has been increased. Spatial database systems extended from the traditional relational database systems are difficult to handle large data sets because of the scalability. SpatialHadoop extended from Hadoop system has a low performance, because spatial computations in SpationHadoop require a lot of write operations of intermediate results to the disk, resulting in the performance degradation. In this paper, Spatial Computation Spark(SC-Spark) is proposed, which is an in-memory based distributed processing framework. SC-Spark is extended from Spark in order to efficiently perform the spatial operation for large-scale data. In addition, SC-Spark based on the GPGPU is developed to improve the performance of the SC-Spark. SC-Spark uses the advantage of the Spark holding intermediate results in the memory. And GPGPU-based SC-Spark can perform spatial operations in parallel using a plurality of processing elements of an GPU. To verify the proposed work, experiments on a single AMD system were performed using SC-Spark and GPGPU-based SC-Spark for Point-in-Polygon and spatial join operation. The experimental results showed that the performance of SC-Spark and GPGPU-based SC-Spark were up-to 8 times faster than SpatialHadoop.

A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster (하둡 클러스터의 네트워크 사용량 감소를 위한 블록 재배치 알고리즘)

  • Kim, Jun-Sang;Kim, Chang-Hyeon;Lee, Won-Joo;Jeon, Chang-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.11
    • /
    • pp.9-15
    • /
    • 2014
  • In this paper, We propose a block reallocation algorithm for reducing network traffic in Hadoop cluster. The scheduler of Hadoop cluster receives a job from users. And the job is divided into multiple tasks assigned to nodes. At this time, the scheduler allocates the task to the node that satisfied data locality. If a task is assigned to the node that does not have the data(block) to be processed, the task is processed after the data transmission from another node. There is difference of workload among nodes because blocks in cluster have different access frequency. Therefore, the proposed algorithm relocates blocks according to the task allocation pattern of Hadoop scheduler. Eventually, workload of nodes are leveled, and the case of the task processing in a node that does not have the block to be processing is reduced. Thus, the network traffic of the cluster is also reduced. We evaluate the proposed block reallocation algorithm by a simulation. The simulation result shows maximum 23.3% reduction of network consumption than default delay scheduling for jobs processing.

GLORY-FS: 대규모 인터넷 서비스를 위한 분산 파일 시스템

  • Kim, Hong-Yeon;Jin, Gi-Seong;Cha, Myeong-Hun;Lee, Sang-Min;Lee, Sang-Min;Kim, Yeong-Cheol;Kim, Yeong-Gyun
    • Information and Communications Magazine
    • /
    • v.30 no.4
    • /
    • pp.16-22
    • /
    • 2013
  • 본고에서는 분산 파일 시스템 기술의 현황 및 최근 이슈를 다룬다. 먼저 클라우드 컴퓨팅 및 빅데이터 분석 분야에서 산업체 표준으로 간주되고 있는 Hadoop의 분산 파일 시스템을 위주로 현황과 한계에 대해 다루고, 국내에서 개발된 유사한 구조의 분산 파일시스템인 GLORY-FS를 Hadoop 파일 시스템과 대비하여 국내 활용 사례를 기반으로 유사성 및 차이점을 비교한다.

Appingpot : Application curation platform based on Hadoop and Spark (Appingpot : 하둡 및 스파크를 활용한 어플리케이션 큐레이션 플랫폼)

  • Jeon, Sangwoo;Shim, Euiseok;Chi, Jeonghee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.372-373
    • /
    • 2016
  • 현재 해외뿐만 아니라 국내에서도 큐레이션 서비스가 활발히 운영중이다. 폭발적으로 증가한 어플리케이션 마켓 시장에서 사용자들은 자신에게 맞는 앱을 찾고 설치하기 어려워지고 있다. 이에 대응하여 본 논문에서는 어플리케이션 큐레이션 서비스인 Appingpot 시스템을 제안한다. Appingpot에서는 사용자들로부터 수집된 앱 로그데이터와 Facebook 친구 정보를 기반으로 Hadoop과 Spark를 통해 사용자들에게 적합한 앱을 추천하는 서비스를 제공한다.

Research of Soft-Interface Creation and Provision Methodology According to Applications Based on Mobile Device Environment (모바일 디바이스 환경에서 어플리케이션에 따른 소프트 인터페이스 제작 및 제공 방안 연구)

  • Cho, Changhee;Park, Sanghyun;Lee, Sang-Joon;Kim, Jinsul
    • Journal of Digital Contents Society
    • /
    • v.14 no.4
    • /
    • pp.513-519
    • /
    • 2013
  • In this paper, we provide interfaces according to user application environments and provide tools through web-site that users can create interface to apply a wide range of application environment. HTML5 is used in the creation processing, so users can create various interfaces by dragging mouse and apply it to multimedia, game applications as well as documents by using the ASCII code and key events that are provided in the Android OS. Database of interfaces is stored in HDFS (Hadoop Distributed File System) based on Hadoop for management and users can have their own designed interface or select interfaces through simple login any time. In order to provide interface quickly, HIVE based on Hadoop is used for search and the data is provided in XML file which smart mobile can process quickly.

Integrated Verification of Hadoop Cluster Prototypes and Analysis Software for SMB (중소기업을 위한 하둡 클러스터의 프로토타입과 분석 소프트웨어의 통합된 검증)

  • Cha, Byung-Rae;Kim, Nam-Ho;Lee, Seong-Ho;Ji, Yoo-Kang;Kim, Jong-Won
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.2
    • /
    • pp.191-199
    • /
    • 2014
  • Recently, researches to facilitate utilization by small and medium business (SMB) of cloud computing and big data paradigm, which is the booming adoption of IT area, has been on the increase. As one of these efforts, in this paper, we design and implement the prototype to tentatively build up Hadoop cluster under private cloud infrastructure environments. Prototype implementation are made on each hardware type such as single board, PC, and server and performance is measured. Also, we present the integrated verification results for the data analysis performance of the analysis software system running on top of realized prototypes by employing ASA (American Standard Association) Dataset. For this, we implement the analysis software system using several open sources such as R, Python, D3, and java and perform a test.