DOI QR코드

DOI QR Code

Adaptable I/O System based I/O Reduction for Improving the Performance of HDFS

  • Park, Jung Kyu (Seungjae Baek is with the Korea Institute of Ocean Science and Technology) ;
  • Kim, Jaeho (Seungjae Baek is with the Korea Institute of Ocean Science and Technology) ;
  • Koo, Sungmin (Seungjae Baek is with the Korea Institute of Ocean Science and Technology) ;
  • Baek, Seungjae (Seungjae Baek is with the Korea Institute of Ocean Science and Technology)
  • Received : 2016.10.10
  • Accepted : 2016.12.12
  • Published : 2016.12.30

Abstract

In this paper, we propose a new HDFS-AIO framework to enhance HDFS with Adaptive I/O System (ADIOS), which supports many different I/O methods and enables applications to select optimal I/O routines for a particular platform without source-code modification and re-compilation. First, we customize ADIOS into a chunk-based storage system so its API semantics can fit the requirement of HDFS easily; then, we utilize Java Native Interface (JNI) to bridge HDFS and the tailored ADIOS. We use different I/O patterns to compare HDFS-AIO and the original HDFS, and the experimental results show the design feasibility and benefits. We also examine the performance of HDFS-AIO using various I/O techniques. There have been many studies that use ADIOS, however our research is expected to help in expanding the function of HDFS.

Keywords

References

  1. J. Dean and S. Ghemawat, "Mapreduce: simplied data processing on large clusters," OSDI'04, vol.6, pp. 10-10, 2004.
  2. S. Ghemawat, H. Gobioff, and S. Leung, "The google file system," SOSP'03, pp. 29-43, 2003.
  3. The apache hadoop project, http://hadoop.apache.org/.
  4. J. Lofstead, S. Klasky, K. Schwan, N. Podhorszki and C. Jin, "Flexible io and integration for scientific codes through the adaptable io system (adios)," CLADE'08, pp. 15-25, 2008.
  5. Y. Tian, Z. Liu, S. Klasky, B. Wang, H. Abbasi, S. Zhou, N. Podhorszki, T. Clune, J. Logan, and W. Yu, "A lightweight i/o scheme to facilitate spatial and temporal queries of scientific data analytics," MSST'13, 2013.
  6. Adios users manual, http://users.nccs.gov/pnorbert/ADIOS-UsersManual-1.5.0.pdf.
  7. Mapreduce 2.0 (yarn), http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.
  8. H. Abbasi, M. Wolf, and K. Schwan., "Live data workspace: A fexible, dynamic and extensible platform for petascale applications," Cluster'07, 2007.
  9. F. Schmuck and R. Haskin, "Gpfs: A shared-disk file system for large computing clusters," FAST'02, 2002.
  10. Lustre file system, http://www.lustre.org.
  11. Posix write, http://linux.die.net/man/2/write.
  12. Java native interface JNI, http://docs.oracle.com/javase/6/docs/technotes/guides/jni.
  13. J. Shafer, S. Rixner, and. L. Cox, "The hadoop distributed filesystem: Balancing portability and performance," ISPASS'10, 2010.
  14. Y. Wang, X. Que, W. Yu, D. Goldenberg and D. Sehgal, "Hadoop acceleration through network levitated merge," SC'11, 2011.
  15. Z. Liu, B. Wang, T. Wang, Y. Tian, C. Xu, Y. Wang, W. Yu, C. A. Cruz, S. Zhou, T. Clune, and S. Klasky, "Profiling and improving i/o performance of a large-scale climate scientific application," ICCCN'13, 2013.
  16. J. Appavoo, V. Uhlig, A. Stoess, J. Waterlandy, B. Rosenburgy, R. Wisniewskiy, D. D. Silvay, E. V. Hensbergeny and U. Steinberg, "Providing a cloud network infrastructure on a supercomputer," HPDC'10, 2010.
  17. Xiaobing, "HadioFS: Improve the Performance of HDFS by Off-loading I/O to ADIOS," Auburn University, 2013.