DOI QR코드

DOI QR Code

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon (Dept. of Computer Science & Engineering, Korea Polytechnic University)
  • Received : 2017.04.06
  • Accepted : 2017.05.25
  • Published : 2017.08.31

Abstract

Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

Keywords

References

  1. Y. Matias, J. S. Vitter, and M. Wang, "Wavelet-based histograms for selectivity estimation," in Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, NY, 1998, pp. 448-459.
  2. E. J. Stollnitz, T. D. DeRose, and D. H. Salesin, Wavelet for Computer Graphics: Theory and Applications, San Francisco, CA: Morgan Kaufmann, 1996.
  3. MapReduce Tutorial [Online]. Available: https://hadoop.apache.org/docs/r1.2.1/mapred-tutorial.html.
  4. M. Garofalakis and P. B. Gibbons, "Probabilistic wavelet synopses," Journal of ACM Transactions on Database Systems, vol. 29, no. 1, pp. 43-90, 2004. https://doi.org/10.1145/974750.974753
  5. Y. Shi, X. Meng, F. Wang, and Y. Gan, "HEDC: a histogram estimator for data in the cloud," in Proceedings of the 4th International Workshop on Cloud Data Management, Mui, HI, 2012, pp. 51-58.
  6. V. Poosala, Y. E. Ioannidis, P. J. Haas, and E. J. Shekita, "Improved histograms for selectivity estimation of range predicates," in Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, New York, NY, 1996, pp. 294-305.
  7. J. Jestes, K. Yi, and F. Li, "Building wavelet histograms on large data in MapReduce," Proceedings of the VLDB Endowment, vol. 5, no. 2, pp. 109-120, 2011. https://doi.org/10.14778/2078324.2078327
  8. P. Cao and Z. Wang, "Efficient top-k query calculations in distributed networks," in Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing, New York, NY, 2004, pp. 206-215.
  9. Wikipedia Page Traffic Statistics [Online]. Available: http://aws.amazon.com/datasets/2596.

Cited by

  1. Job Allocation Mechanism for Battery Consumption Minimization of Cyber-Physical-Social Big Data Processing Based on Mobile Cloud Computing vol.6, pp.2169-3536, 2018, https://doi.org/10.1109/ACCESS.2018.2803730