DOI QR코드

DOI QR Code

Design and Implementation of Distributed Cluster Supporting Dynamic Down-Scaling of the Cluster

노드의 동적 다운 스케일링을 지원하는 분산 클러스터 시스템의 설계 및 구현

  • Woo-Seok Ryu (Dept. of Health Care Management, Catholic University of Pusan)
  • 류우석 (부산가톨릭대학교 병원경영학과)
  • Received : 2023.02.25
  • Accepted : 2023.04.17
  • Published : 2023.04.30

Abstract

Apache Hadoop, a representative framework for distributed processing of big data, has the advantage of increasing cluster size up to thousands of nodes to improve parallel distributed processing performance. However, reducing the size of the cluster is limited to the extent of permanently decommissioning nodes with defects or degraded performance, so there are limitations to operate multiple nodes flexibly in small clusters. In this paper, we discuss the problems that occur when removing nodes from the Hadoop cluster and propose a dynamic down-scaling technique to manage the distributed cluster more flexibly. To do this, we design and implement a modified Hadoop system and interfaces to support dynamic down-scaling of the cluster which supports temporary pause of a node and reconnection of it when necessary, rather than decommissioning the node when removing a node from the Hadoop cluster. We have verified that effective downsizing can be performed without performance degradation based on experimental results.

빅데이터의 분산 처리를 수행하기 위한 대표적인 프레임워크인 하둡은 클러스터 규모를 수천 개 이상의 노드까지 증가시켜서 병렬분산 처리 성능을 높일 수 있는 장점이 있다. 하지만 클러스터의 규모를 줄이는 것은 결함이 있거나 성능이 저하된 노드들을 영구적으로 퇴역시키는 수준에서 제한되어 있음에 따라 소규모 클러스터에서 여러 노드들을 유연하게 운용하기에는 한계가 있다. 본 논문에서는 하둡 클러스터에서 노드를 제거할 때 발생하는 문제점을 논의하고 분산 클러스터의 규모를 탄력적으로 관리하기 위한 동적 다운 스케일링 기법을 제안한다. 일시적 다운스케일을 목적으로 노드를 제거할 때 완전히 퇴역시키는 것이 아니라 일시적으로 해제하고 필요시 다시 연결할 수 있도록 함으로써 동적 다운 스케일링을 지원할 수 있도록 시스템과 인터페이스를 설계하고 구현하였다. 실험 결과 성능저하 없이 효과적으로 다운 스케일링을 수행하는 것을 검증하였다.

Keywords

Acknowledgement

이 연구는 2016년도 정부(미래창조과학부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구 임(No. NRF-2016R1C1B1012364).

References

  1. H. Ryoo and M. Lee, "The study of Recommendation system for food ingredients through the controlling food recipe's ingredient rate with the ICT and Big Data," J. of The Korea Institute of Electronic Communication Sciences, vol. 16, no. 2, 2021, pp. 339-346.
  2. H. Harb, H. Mrouse, A. Mansour, A. Nasser, and E. M. Cruz, "A hadoop-based platform for patient classification and disease diagnosis in healthcare applications." Sensors, vol. 20, no. 7, 2020, pp. 1931-1951. https://doi.org/10.3390/s20071931
  3. S. Yang, Y. Kim, S. Kim, and W. Kim, "Energy Big Data Pre-processing System for Energy New Industries," J. of The Korea Institute of Electronic Communication Sciences, vol. 16, no. 5, 2021, pp. 851-858.
  4. W. K. Lai, Y. U. Chen, and T. Y. Wu, "Towards a framework for large-scale multimedia data storage and processing on Hadoop platform," Journal of Supercomputing, vol. 68, no. 1, 2013, pp. 488-507. https://doi.org/10.1007/s11227-013-1050-4
  5. W. Nemouchi, S. Boudouda, and N. Zarour, "A Dynamic Scaling Approach in Hadoop YARN," Int. J. of Organizational and Collective Intelligence (IJOCI), vol. 12, no. 2, 2022, pp. 1-17. https://doi.org/10.4018/IJOCI.286176
  6. Y. Gao and C. Huang, "Energy-efficient scheduling of mapreduce tasks based on load balancing and deadline constraint in heterogeneous hadoop yarn cluster," In IEEE 24th Int. Conf. on Computer Supported Cooperative Work in Design (CSCWD), Dalan, China, May, 2021, pp. 220-225.
  7. V. Pandey and P. Saini, "A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in Hadoop YARN," Cluster Computing, vol. 24, no. 2, 2021, pp. 683-699. https://doi.org/10.1007/s10586-020-03146-7
  8. W. Ryu, "Implementation of dynamic node management in Hadoop cluster," In Int. Conf. on Electronics, Information, and Communication (ICEIC 2018), HI, USA, 2018, pp. 814-815.
  9. W. Ryu, "Design of Elastic Hadoop Supporting Dynamic Scaling of the Cluster," In 4th Int. Conf. on Big Data, Small Data, Linked Data and Open Data (ALLDATA 2018), IARIA, Athens, Greece, 2018, pp. 26-27.
  10. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," In IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), NV, USA, May 2010, pp. 1-10.
  11. Y. Gao and K. Zhang, "Deadline-aware Preemptive Job Scheduling in Hadoop YARN Clusters," In IEEE 25th Int. Conf. on Computer Supported Cooperative Work in Design (CSCWD), Hangzhou, China, May, 2022, pp. 1269-1274.