DOI QR코드

DOI QR Code

The Design of Method for Efficient Processing of Small Files in the Distributed System based on Hadoop Framework

하둡 프레임워크 기반 분산시스템 내의 작은 파일들을 효율적으로 처리하기 위한 방법의 설계

  • 김승현 (순천대학교 컴퓨터과학과) ;
  • 김영근 (순천대학교 컴퓨터과학과) ;
  • 김원중 (순천대학교 컴퓨터공학과)
  • Received : 2015.09.11
  • Accepted : 2015.10.23
  • Published : 2015.10.31

Abstract

Hadoop framework was designed to be suitable for processing very large files. On the other hand, when processing the Small Files, it waste the resource of a distributed system, and occur performance degradation. It is shown noticeable the more the Small Files. This problem is caused by the Small Files, it can be solved through the merging of associated Small Files. But a way of merging of Small Files has some limited point. in this paper, examines existing limit of merging method, design merging method Small Files for effective process.

하둡 프레임워크는 매우 큰 크기의 파일을 처리하기에 적합하도록 설계되었다. 반면 작은 크기의 파일을 처리할 경우, 분산 시스템의 자원 낭비와 분석 성능 저하가 발생하며 이는 작은 파일의 개수가 많을수록 현저하게 나타난다. 이 문제는 파일의 크기가 작기 때문에 발생하므로, 연관성 있는 작은 파일들의 병합을 통해 해결할 수 있다. 그러나 기존의 작은 파일 병합 방법들은 부차적인 한계점을 지니고 있다. 따라서 본 연구는 기존의 병합 방법의 문제점에 대하여 살펴보고, 작은 파일들의 효율적 처리를 위한 병합 방법을 설계하였다.

Keywords

References

  1. Apache Hadoop, "What Is Apache Hadoop?," The Apache Software Foundation, https://hadoop.apache.org, Sept. 2015.
  2. K. Park, K. Kim, K. Ban, and E. Kim, "Design and Implementation of Cloud-based Sensor Data Management System," J. of the Korea Institute of Electronic Communication Sciences, vol. 5, no. 6, 2010, pp. 672-677.
  3. Y. Kim, S. Kim, M. Jo, and W. Kim, "The Bigdata Processing Environment Building for the Learning System," J. of the Korea Institute of Electronic Communication Sciences, vol. 9, no. 7, 2014, pp. 791-797. https://doi.org/10.13067/JKIECS.2014.9.7.791
  4. S. Jung and C. Sim, "A Study on a Working Pattern Analysis Prototype using Correlation Analysis and Linear Regression Analysis in Welding BigData Environment," J. of the Korea Institute of Electronic Communication Sciences, vol. 9, no. 10, Oct. 2014, pp. 1071-1078. https://doi.org/10.13067/JKIECS.2014.9.10.1071
  5. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," In Proc. IEEE Int. Symp. on Mass Storage Systems and Technologies(MSST), Incline Village NV, May. 2010, pp. 1-10.
  6. J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, 2008, pp. 107-113. https://doi.org/10.1145/1327452.1327492
  7. M. Asay, "Beyond Hadoop: The streaming future of big data", InfoWorld, http://www.infoworld.com/article/2900504/big-data/beyond-hadoop-streaming-future-of-big-data.html, Mar. 2015.
  8. T. White, Hadoop: The Definitive Guide Fourth Edition. Sebastopol, USA: O'Reilly, Apr. 2015.
  9. T. White, "The Small Files Problem," Cloudera, http://blog.cloudera.com/blog/2009/02/the-small-filesproblem, Feb. 2009.
  10. Apache Hadoop, "Hadoop Archives Guide," The Apache Software Foundation, http://hadoop.apache.org/docs/r1.2.1/hadoop_archives, Aug. 2013.
  11. C. Vorapongkitipun and N. Nupairoj, "Improving performance of small-file accessing in Hadoop," In Computer Science and Software Engineering (JCSSE), 2014 11th Int. Joint Conf. on IEEE, Chon Buri, Thailand, May. 2014. pp. 200-205.
  12. C. Kim and J. Chung, "Processing Method of Mass Small File Using Hadoop Platform," J. of Advanced Navigation Technology(JANT), vol. 18, no. 4, Aug. 2014, pp. 401-408. https://doi.org/10.12673/jant.2014.18.4.401