DOI QR코드

DOI QR Code

Performance Enhancement and Evaluation of Distributed File System for Cloud

클라우드 분산 파일 시스템 성능 개선 및 평가

  • 이종혁 (대구가톨릭대학교 빅데이터공학과)
  • Received : 2018.08.29
  • Accepted : 2018.09.27
  • Published : 2018.11.30

Abstract

The choice of a suitable distributed file system is required for loading large data and high-speed processing through subsequent applications in a cloud environment. In this paper, we propose a write performance improvement method based on GlusterFS and evaluate the performance of MapRFS, CephFS and GlusterFS among existing distributed file systems in cloud environment. The write performance improvement method proposed in this paper enhances the response time by changing the synchronization level used by the synchronous replication method from disk to memory. Experimental results show that the distributed file system to which the proposed method is applied is superior to other distributed file systems in the case of sequential write, random write and random read.

클라우드 환경에서 빅데이터 적재와 이후 애플리케이션을 통한 고속 처리를 위해서는 적합한 분산 파일 시스템의 선택이 요구된다. 본 논문에서는 GlusterFS 기반 쓰기 성능 향상 방법을 제안하고 클라우드 환경에서 기존 분산 파일 시스템 중 MapRFS, CephFS, GlusterFS와 성능을 비교 평가한다. 본 논문에서 제안한 쓰기 성능 향상 방법은 동기식 스토리지 복제 방식에서 사용하는 동기화 수준을 디스크에서 메모리로 변경함으로써 응답 시간을 향상 시킨다. 실험 결과는 본 논문의 제안 방법이 적용된 분산 파일 시스템이 순차 쓰기의 경우와 랜덤 쓰기와 랜덤 읽기가 혼합된 경우에서 다른 분산 파일 시스템 대비 성능이 우수함을 보인다.

Keywords

JBCRIN_2018_v7n11_275_f0001.png 이미지

Fig. 1. File Synchronization in eGlusterFS

JBCRIN_2018_v7n11_275_f0002.png 이미지

Fig. 2. Sequential Write Throughput (MB/s) in a General SSD Test Environment

JBCRIN_2018_v7n11_275_f0003.png 이미지

Fig. 3. Sequential Write Throughput (MB/s) in a NVMe SSD Test Environment

JBCRIN_2018_v7n11_275_f0004.png 이미지

Fig. 4. Sequential Read Throughput (MB/s) in a General SSD Test Environment

JBCRIN_2018_v7n11_275_f0005.png 이미지

Fig. 5. Sequential Read Throughput (MB/s) in a NVMe SSD Test Environment

JBCRIN_2018_v7n11_275_f0006.png 이미지

Fig. 6. OLTP Throughput (Transactions/Sec) in a General SSD Test Environment

JBCRIN_2018_v7n11_275_f0007.png 이미지

Fig. 7. OLTP Throughput (Transactions/Sec) in a NVMe SSD Test Environment

Table 1. Experimental Environments

JBCRIN_2018_v7n11_275_t0001.png 이미지

References

  1. M. A. Beyer and D. Laney, The Importance of 'Big Data': A Definition [Internet], https://www.gartner.com/doc/2057415/ importance-big-data-definition.
  2. L. M. Roch, T. Aleksiev, R. Murri, and K. K. Baldridge. "Performance analysis of open‐source distributed file systems for practical large‐scale molecular ab initio, density functional theory, and GW+ BSE calculations," International Journal of Quantum Chemistry, Vol.118, No.1, 2018.
  3. G. Donvito, G. Marzulli, and D. Diacono, "Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis," Journal of Physics: Conference Series, Vol.513, No.4, 2014.
  4. D. Gudu, M. Hardt, and A. Streit, "Evaluating the performance and scalability of the ceph distributed storage system," in Proceedings of IEEE International Conference on Big Data (Big Data), 2014.
  5. FIO [Internet], https://github.com/axboe/fio
  6. SysBench [Internet], https://github.com/akopytov/sysbench
  7. Cooper, Brian F., Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears, "Benchmarking cloud serving systems with YCSB," in Proceedings of the 1st ACM Symposium on Cloud Computing, pp.143-154. ACM, 2010.
  8. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the clouds: a study of emerging scale-out workloads on modern hardware," ACM SIGPLAN Notices, Vol.47, No.4, pp.37-48, 2012.
  9. A. Ghazal, T. Ivanov, P. Kostamaa, A. Crolotte, R. Voong, M. Al-Kateb, W. Ghazal, and R. V. Zicari, "BigBench V2: The New and Improved BigBench," in Proceedings of the 33rd International Conference on Data Engineering (ICDE), pp.1225-1236, 2017.
  10. Z. Ren, W. Shi, J. Wan, F. Cao, and J. Lin, "Realistic and scalable benchmarking cloud file systems: Practices and lessons from AliCloud," IEEE Transactions on Parallel & Distributed Systems, Vol.28, No.1, pp.3272-3285, 2017. https://doi.org/10.1109/TPDS.2017.2715327
  11. Y. Wu, F. Ye, K. Chen, and W. Zheng, "Modeling of distributed file systems for practical performance analysis," IEEE Transactions on Parallel and Distributed Systems, Vol.25, No.1, pp. 156-166, 2014. https://doi.org/10.1109/TPDS.2013.19
  12. T. Harter, D. Borthakur, S. Dong, A. S. Aiyer, L. Tang, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, "Analysis of HDFS under HBase: a facebook messages case study," in Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST), pp.199-212, 2014.
  13. M. Shamma, D. T. Meyer, J. Wires, M. Ivanova, N. C. Hutchinson, and A. Warfield, "Capo: Recapitulating Storage for Virtual Desktops," in Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST), pp. 31-45. 2011.
  14. Y. Gong, C. Hu, Y. Xu, and W. Wang, "A Distributed File System with Variable Sized Objects for Enhanced Random Writes," The Computer Journal, Vol.59, No.10, pp.1536-1550, 2016. https://doi.org/10.1093/comjnl/bxw057
  15. S. Ghemawat, H. Gobioff, and S. Leung, "The Google File System," SIGOPS Oper. Syst. Rev., Vol.37, No.5, pp.29-43, 2003. https://doi.org/10.1145/1165389.945450
  16. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The hadoop distributed file system," in Proceedings of the IEEE 26th Symposium on Mass Storage systems and Technologies (MSST), 2010.
  17. MapRFS [Internet], https://mapr.com/products/mapr-fs.
  18. S. A. Weil, S. A. Brandt, E. L. Miller, D. DE Long, and C. Maltzahn, "Ceph: A scalable, high-performance distributed file system," in Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp.307-320, 2006.
  19. GlusterFS [Internet], https://docs.gluster.org.
  20. NFS-Ganesha [Internet], https://github.com/seoultower/nfs-ganesha