DOI QR코드

DOI QR Code

Implement of MapReduce-based Big Data Processing Scheme for Reducing Big Data Processing Delay Time and Store Data

빅데이터 처리시간 감소와 저장 효율성이 향상을 위한 맵리듀스 기반 빅데이터 처리 기법 구현

  • Lee, Hyeopgeon (Department of Data Analysis, Seoul Gangseo Campus of Korea Polytechnic) ;
  • Kim, Young-Woon (Department of Data Analysis, Seoul Gangseo Campus of Korea Polytechnic) ;
  • Kim, Ki-Young (Department of Computer Software, Seoil University)
  • 이협건 (한국폴리텍대학 서울강서캠퍼스 데이터분석과) ;
  • 김영운 (한국폴리텍대학 서울강서캠퍼스 데이터분석과) ;
  • 김기영 (서일대학교 소프트웨어공학과)
  • Received : 2018.08.06
  • Accepted : 2018.10.20
  • Published : 2018.10.28

Abstract

MapReduce, the Hadoop's essential core technology, is most commonly used to process big data based on the Hadoop distributed file system. However, the existing MapReduce-based big data processing techniques have a feature of dividing and storing files in blocks predefined in the Hadoop distributed file system, thus wasting huge infrastructure resources. Therefore, in this paper, we propose an efficient MapReduce-based big data processing scheme. The proposed method enhances the storage efficiency of a big data infrastructure environment by converting and compressing the data to be processed into a data format in advance suitable for processing by MapReduce. In addition, the proposed method solves the problem of the data processing time delay arising from when implementing with focus on the storage efficiency.

맵리듀스는 하둡의 필수 핵심 기술로 하둡 분산 파일 시스템을 기반으로 빅데이터를 처리하는 가장 보편화되어 사용되고 있다. 그러나 기존 맵리듀스 기반 빅데이터 처리 기법은 하둡 분산 파일 시스템에 정해진 블록의 크기대로 파일 나눠 저장되는 특징으로 인해 인프라 자원의 낭비가 극심하다. 이에 본 논문에서는 효율적인 맵리듀스 기반 빅데이터 처리기법을 제안한다. 제안하는 기법은 처리할 데이터를 사전에 맵리듀스에서 처리하기 적합한 데이터 형태로 변환 및 압축하여 빅데이터 인프라 환경의 저장 효율성을 증가시킨다. 또한 제안하는 기법은 저장 효율성을 중점으로 구현했을 때 발생할 수 있는 데이터 처리 시간의 지연 문제를 해결한다.

Keywords

References

  1. H. G. Lee, Y. W. Kim & K. Y. Kim (2017), Implementation of an Efficient Big Data Collection Platform for Smart Manufacturing. Journal of Engineering and Applied Sciences, 12(2Si), 6304-6307. DOI: 10.3923/jeasci.2017.6304.6307
  2. B. Mahjani, S. Toor, C. Nettelblad & S. Holmgren (2016). A Flexible Computational Framework Using R and Map-Reduce for Permutation Tests of Massive Genetic Analysis of Complex Traits. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(2), 381-392. DOI: 10.1109/TCBB.2016.2527639
  3. Y. W. Kim & H. G. Lee (2017). Implementation of Big Data Analysis System to Prevent Illegal Sales in the Cable TV Industry. Journal of Engineering and Applied Sciences, 12(3Si), 6542-6545. DOI: 10.3923/jeasci.2017.6542.6545
  4. H. G. Lee, Y. W. Kim, K. Y. Kim & J. S. Choi. (2018). Design of GlusterFS Based Big Data Distributed Processing System in Smart Factory, Journal of Korea Institute of Information, Electronics, and Communication Technology, 11(1), 70-75. https://doi.org/10.17661/JKIIECT.2018.11.1.70
  5. H. J. Park. (2016). A Study about Performance Evaluation of Various NoSQL Databases, Journal of Korea Institute of Information, Electronics, and Communication Technology, 9(3), 298-305. https://doi.org/10.17661/jkiiect.2016.9.3.298
  6. H. G. Lee, Y. W. Kim, K. Y. Kim & J. S. Choi (2018). Design of Splunk Platform based Big Data Analysis System for Objectionable Information Detection, Journal of Korea Institute of Information, Electronics, and Communication Technology, 11(1), 76-81. https://doi.org/10.17661/JKIIECT.2018.11.1.76
  7. S. H. Kim, S. H. Chang & S. W Lee (2017). Consumer Trend Platform Development for Combination Analysis of Structured and Unstructured Big Data, Journal of Digital Convergence, 15(6), 133-143. https://doi.org/10.14400/JDC.2017.15.6.133
  8. C. Y. Lee (2017). A Study on Synchronization Effect of A Multi-dimensional Event Database for Big Data Information Sharing, Journal of Digital Convergence, 15(10), 243-251. https://doi.org/10.14400/JDC.2017.15.10.243
  9. Y. U. Jeong (2015). U-healthcare Service Management Scheme for Big Data of Patient Information, Journal of Convergence for Information Technology, 5(1), 1-6. https://doi.org/10.22156/CS4SMB.2015.5.1.001
  10. J. H. Ku (2017). A Study on the Platform for Big Data Analysis of Manufacturing Process, Journal of Convergence for Information Technology, 7(5), 177-182. https://doi.org/10.22156/CS4SMB.2017.7.5.177
  11. I. H. Joo (2017). Spatial Big Data Query Processing System Supporting SQL-based Query Language in Hadoop, Journal of Korea Institute of Information, Electronics, and Communication Technology, 10(1), 1-8. https://doi.org/10.17661/JKIIECT.2017.10.1.1
  12. Y. J. Baek, W. C. Jeong, S. W. Hong & J. H. Park (2017). A step-by-step service encryption model based on routing pattern in case of IP spoofing attacks on clustering environment, Journal of Korea Institute of Information, Electronics, and Communication Technology, 10(6), 580-586. https://doi.org/10.17661/JKIIECT.2017.10.6.580
  13. E. H. Jeong & B. K. Lee. (2017). A Design of Hadoop Security Protocol using One Time Key based on Hash-chain, Journal of Korea Institute of Information, Electronics, and Communication Technology, 10(4), 340-349. https://doi.org/10.17661/JKIIECT.2017.10.4.340
  14. Y. S. Lee (2015). Authentication Method for Safe Internet of Things Environments, Journal of Korea Institute of Information, Electronics, and Communication Technology, 8(1), 51-58. https://doi.org/10.17661/JKIIECT.2015.8.1.051
  15. J. T. Seong (2017). Analysis of Signal Recovery for Compressed Sensing using Deep Learning Technique, Journal of Korea Institute of Information, Electronics, and Communication Technology, 10(4), 257-267. https://doi.org/10.17661/JKIIECT.2017.10.4.257