DOI QR코드

DOI QR Code

LDBAS: Location-aware Data Block Allocation Strategy for HDFS-based Applications in the Cloud

  • Xu, Hua (Department of Computer Science and Technology, University of Science and Technology of China) ;
  • Liu, Weiqing (Department of Computer Science and Technology, University of Science and Technology of China) ;
  • Shu, Guansheng (Department of Computer Science and Technology, University of Science and Technology of China) ;
  • Li, Jing (Department of Computer Science and Technology, University of Science and Technology of China)
  • Received : 2016.12.12
  • Accepted : 2017.11.09
  • Published : 2018.01.31

Abstract

Big data processing applications have been migrated into cloud gradually, due to the advantages of cloud computing. Hadoop Distributed File System (HDFS) is one of the fundamental support systems for big data processing on MapReduce-like frameworks, such as Hadoop and Spark. Since HDFS is not aware of the co-location of virtual machines in the cloud, the default scheme of block allocation in HDFS does not fit well in the cloud environments behaving in two aspects: data reliability loss and performance degradation. In this paper, we present a novel location-aware data block allocation strategy (LDBAS). LDBAS jointly optimizes data reliability and performance for upper-layer applications by allocating data blocks according to the locations and different processing capacities of virtual nodes in the cloud. We apply LDBAS to two stages of data allocation of HDFS in the cloud (the initial data allocation and data recovery), and design the corresponding algorithms. Finally, we implement LDBAS into an actual Hadoop cluster and evaluate the performance with the benchmark suite BigDataBench. The experimental results show that LDBAS can guarantee the designed data reliability while reducing the job execution time of the I/O-intensive applications in Hadoop by 8.9% on average and up to 11.2% compared with the original Hadoop in the cloud.

Keywords

References

  1. Min Chen, Shiwen Mao, and Yunhao Liu, "Big data: A survey," Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, April, 2014. https://doi.org/10.1007/s11036-013-0489-0
  2. Jeffrey Dean and Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008. https://doi.org/10.1145/1327452.1327492
  3. Hadoop.
  4. Peter Mell, and Tim Grance, "The NIST definition of cloud computing," Technical Report, 2011.
  5. Amazon EMR.
  6. Yifeng Geng, et al., "Location-aware mapreduce in virtual cloud," In Proc. of International Conf. on Parallel Processing, pp. 275-284, September 13-16, 2011.
  7. Matei Zaharia, et al., "Improving MapReduce performance in heterogeneous environments," in Proc. of 8th USENIX conf. on Operating systems design and implementation, vol. 8, no. 4, pp. 29-42, December 8-10, 2008.
  8. Konstantin Shvachko, et al., "The hadoop distributed file system," in Proc. of 26th symposium on Mass storage systems and technologies, pp. 1-10, May 3-7, 2010.
  9. Matei Zaharia, et al., "Spark: Cluster computing with working sets," in Proc. of 2nd USENIX conf. on Hot topics in cloud computing, vol. 10, pp. 10-10, June 22-25, 2010.
  10. Sherif Sakr, Anna Liu, and Ayman G. Fayoumi, "The family of mapreduce and large-scale data processing systems," ACM Computing Surveys, vol. 46, no.1, pp. 11, October, 2013.
  11. OpenStack Sahara.
  12. Dominic Battre, et al., "Evaluation of network topology inference in opaque compute clouds through end-to-end measurements," in Proc. of International Conf. on Cloud Computing, pp. 17-24, July 4-9, 2011.
  13. Mark Coates, et al., "Maximum likelihood network topology identification from edge-based unicast measurements," ACM SIGMETRICS Performance Evaluation Review, vol. 30, no. 1, pp. 11-20, June, 2002. https://doi.org/10.1145/511399.511337
  14. Jeffrey Shafer, "I/O virtualization bottlenecks in cloud computing today," in Proc. of 2nd Conf. on I/O virtualization, pp. 5-5, 2010.
  15. Lei Wang, et al., "Bigdatabench: A big data benchmark suite from internet services," in Proc. of 20th International Symposium on High Performance Computer Architecture, pp. 488-499, February 15-19, 2014.
  16. Kento Aida, et al., "Evaluation on the performance fluctuation of hadoop jobs in the cloud," in Proc. of 16th International Conf. on Computational Science and Engineering, pp. 159-166, December 3-5, 2013.
  17. Lei Lei, "Towards a high performance virtual hadoop cluster," Journal of Convergence Information Technology, vol. 7, no. 6, 2012.
  18. VMware Serengeti.
  19. Jongse Park, et al. "Locality-aware dynamic VM reconfiguration on MapReduce clouds," in Proc. of 21st international symposium on High-Performance Parallel and Distributed Computing, pp. 27-36, June 18-22, 2012.
  20. Kwonyong Lee, et al., "A dynamic block device reconfiguration algorithm in virtual MapReduce cluster," Cluster computing, vol. 17, no. 4, pp. 1171-1183, 2014. https://doi.org/10.1007/s10586-014-0375-y
  21. Hua Xu, et al, "Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud," in Proc. of 9th International Conf. on Cloud Computing, pp. 252-259, June 27-July 2, 2016.
  22. Vinod Kumar Vavilapalli, et al., "Apache hadoop yarn: Yet another resource negotiator," in Proc. of 4th annual Symposium on Cloud Computing, pp. 5, October 1-3, 2013.
  23. Changqing Ji, et al, "Big data processing in cloud computing environments," in Proc. of 12th International Symposium on Pervasive Systems, Algorithms and Networks, pp. 17-23, December 13-15, 2012.
  24. Shin-Jer Yang and Yi-Ru Chen, "Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds," Journal of Network and Computer Applications, vol. 57, pp. 61-70, November, 2015. https://doi.org/10.1016/j.jnca.2015.07.012

Cited by

  1. Attitudes and Performance of Workers Preparing for the Fourth Industrial Revolution vol.12, pp.8, 2018, https://doi.org/10.3837/tiis.2018.08.027
  2. Energy Efficient and Low-Cost Server Architecture for Hadoop Storage Appliance vol.14, pp.12, 2018, https://doi.org/10.3837/tiis.2020.12.002