• Title/Summary/Keyword: Hadoop Scheduler

Search Result 4, Processing Time 0.016 seconds

A Block Relocation Algorithm for Reducing Network Consumption in Hadoop Cluster (하둡 클러스터의 네트워크 사용량 감소를 위한 블록 재배치 알고리즘)

  • Kim, Jun-Sang;Kim, Chang-Hyeon;Lee, Won-Joo;Jeon, Chang-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.11
    • /
    • pp.9-15
    • /
    • 2014
  • In this paper, We propose a block reallocation algorithm for reducing network traffic in Hadoop cluster. The scheduler of Hadoop cluster receives a job from users. And the job is divided into multiple tasks assigned to nodes. At this time, the scheduler allocates the task to the node that satisfied data locality. If a task is assigned to the node that does not have the data(block) to be processed, the task is processed after the data transmission from another node. There is difference of workload among nodes because blocks in cluster have different access frequency. Therefore, the proposed algorithm relocates blocks according to the task allocation pattern of Hadoop scheduler. Eventually, workload of nodes are leveled, and the case of the task processing in a node that does not have the block to be processing is reduced. Thus, the network traffic of the cluster is also reduced. We evaluate the proposed block reallocation algorithm by a simulation. The simulation result shows maximum 23.3% reduction of network consumption than default delay scheduling for jobs processing.

The Creation and Placement of VMs and Tasks in Virtualized Hadoop Cluster Environments

  • Kim, Tae-Won;Chung, Hae-jin;Kim, Joon-Mo
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.12
    • /
    • pp.1499-1505
    • /
    • 2012
  • Recently, the distributed processing system for big data has been actively investigated owing to the development of high speed network and storage technologies. In addition, virtual system that can provide efficient use of system resources through the consolidation of servers has been increasingly recognized. But, when we configure distributed processing system for big data in virtual machine environments, many problems occur. In this paper, we did an experiment on the optimization of I/O bandwidth according to the creation and placement of VMs and tasks with composing Hadoop cluster in virtual environments and evaluated the results of an experiment. These results conducted by this paper will be used in the study on the development of Hadoop Scheduler supporting I/O bandwidth balancing in virtual environments.

An Optimal VM creation by considering I/O Bandwidth in Virtualized Hadoop Cluster Environments (가상화된 Hadoop 클러스터 환경에서 I/O 대역폭을 고려한 최적VM 생성)

  • Kim, Tae-Won;Kim, Hyun-Jun;Kim, Joom-Mo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.151-153
    • /
    • 2012
  • 최근 고속 네트워크와 저장 기술의 발전으로 인하여 대용량 데이터 분산 처리 시스템에 관한 연구가 활발히 진행되고 있다. 또한 서버의 통합을 통해 시스템 자원의 효율적인 활용을 제공할 수 있는 시스템 가상화가 많은 주목을 받고 있다. 그러나 가상 머신 환경에서 대용량 데이터 분산처리 시스템을 구성할 경우 많은 문제가 발생하게 된다. 본 논문에서는 가상 머신 환경에서 Hadoop 클러스터를 활용할 때 가상 데이터 노드의 개수에 따른 I/O 대역폭 최적화에 대한 실험을 하고 평가를 한다. 본 논문에서 수행한 실험 결과는 가상 머신 환경에서 I/O 대역폭 밸런싱(balancing)을 지원하는 Hadoop Scheduler의 개발 연구에 사용될 것이다.

A Novel Method of Improving Cache Hit-rate in Hadoop MapReduce using SSD Cache

  • Kim, Jong-Chan;An, Jae-Hoon;Kim, Young-Hwan;Jeon, Ki-Man
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.8
    • /
    • pp.1-6
    • /
    • 2015
  • The MapReduce Program of Hadoop Distributed File System operates on any unspecified nodes due to distributed-parallel process and block replicate for data stability. Since it is difficult to guarantee the cache locality when a Solid State Drive is used as a cache in hadoop, cache hit-rate is decreased. In this paper, we suggest a method to improve cache hit rate by pre-loading the input data of the MapReduce onto the SSD cache. To perform this method, we estimated the blocks that are used on each node by using capacity scheduler and block metadata. Eventually we could increase the performance of SSD cache by loading the blocks onto SSD cache before the Map Task run.