Performance Improvement of Declustering Algorithm by Efficient Grid-Partitioning Multi-Dimensional Space

다차원 공간의 효율적인 그리드 분할을 통한 디클러스터링 알고리즘 성능향상 기법

  • 김학철 (한국전자통신연구원 융합기술연구부문)
  • Received : 2009.12.09
  • Accepted : 2010.02.17
  • Published : 2010.03.30

Abstract

In this paper, we analyze the shortcomings of the previous declustering methods, which are based on grid-like partitioning and a mapping function from a cell to a disk number, for high-dimensional space and propose a solution. The problems arise from the fact that the number of splitting is small(for the most part, binary-partitioning is sufficient), and the side length of a range query whose selectivity is small is quite large. To solve this problem, we propose a mathematical model to estimate the performance of a grid-like partitioning method. With the proposed estimation model, we can choose a good grid-like partitioning method among the possible schemes and this results in overall improvement in declustering performance. Several experimental results show that we can improve the performance of a previous declustering method up to 2.7 times.

본 논문에서는 그리드 분할과 매핑함수에 기반하여 영역질의 성능향상을 위해서 기존에 제시된 디클러스터링 방법들을 다차원 공간에 대해서 적용할 때의 문제점을 분석하고 해결법을 제시한다. 다차원 공간에 대해서 기존에 제시된 방법들을 적용할 때의 문제점은 각 차원의 분할 횟수가 적고(대부분 이진 분할이 발생함) 극히 작은 선택률에 대해서도 영역질의 각 차원의 길이가 커지기 때문에 발생한다. 본 논문에서는 이를 해결하기 위하여 다차원 공간의 다양한 그리드 분할방법에 대해서 수학적으로 성능을 예측하는 모델을 제시한다. 제시한 수학 모델을 이용하여 가능한 다양한 그리드 분할 방법들 가운데 영역질의와 겹치는 그리드 셀의 수를 감소시키는 분할 방법을 선택할 수 있으며, 이는 디클러스터링 알고리즘의 전체 성능향상으로 귀결된다. 다양한 실험결과, 본 논문에서 제시한 분할 방법을 적용할 때, 기존에 제시된 디클러스터링 알고리즘의 성능을 최대 2.7배까지 향상시킬 수 있음을 알 수 있었다.

Keywords

References

  1. I. Kamel and C. Faloutsos, "Parallel R-trees," In Proc. of SIGMOD Conference, 1992, pp.195-204.
  2. S. Berchtold, D. A. Keim and H.-P. Kriegel, "The X-tree: An Index Strucutre for High-Dimensional Data," In Proc. of VLDB Conference, 1996, pp.28-39.
  3. S. Berchtold, C. Bohm and H-P. Kriegel, "The Pyramid-Technique: Towards Breaking the Curse of Dimensionlaity," In Proc. of SIGMOD conference, 1998, pp. 142-153.
  4. R. Zhang, B.C. Ooi and K-L. Tan, "Making the Pyramid Technique to Robust to Query Types and Workloads," In Proc. of ICDE Conference, 2004, pp.313-324.
  5. 서영덕, "병렬 공간 색인을 위한 검색 기법", 한국공간정보시스템학회 논문지, 제 7권 제 2호, 2005, pp.81-89.
  6. 김정준, 강홍구, 김동오, 한기준, "메인 메모리 다차원인덱스를 위한 효율적인 MBR 압축 기법", 한국공간정보시스템학회 논문지, 제 9권 제 2호, 2007, pp.13-23.
  7. C.-S. Chen, J.-Y. Liang, Y.-K. Lee, M.-H. Fan and C.-H. Huang, "Efficient Searching Algorithm for Multi-Dimensional Space Data Using Hilbert Space-Filling Curves," In Proc. of FCS, 2008, pp.264-269.
  8. H Chen, J. Liu, K. Furuse, J.X. Yu and N. Ohbo, "Indexing the Functions: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions," In Proc. of ADMA Conference, 2009, pp.67-78.
  9. H.C Du and J.S. Soblewski, "Disk Allocation for Cartisian Files on Multiple-Disk Systems," ACMTrans.DatabaseSystems, Vol.7, No.1, 1982, pp.82-102.
  10. J. Li, J. Srivastava, and D. Rotem, "CMD: A Multidimensional Declusteirng Method for Parallel Data Systems," In Proc. of VLDB Conference, 1992, pp.3-14.
  11. M. H. Kim and S. Pramanik, "Optimal File Distribution For Partial Match Retrieval.,"In Proc. of SIGMOD Conference, 1988, pp.173-182.
  12. C. Faloutsos and D. Metaxas, "Disk Allocation Methods Using Error Correcting Codes," IEEE Transon Computers, Vol.40 No.8, 1991, pp. 907-914. https://doi.org/10.1109/12.83636
  13. C. Faloutsos and P. Bhagwat, "Declusteirng Using Fractals," In Proc. of Parallel and Distributed Information Systems Conference, 1993, pp.18-25.
  14. S. Prabhakr, K. Abdel-Ghaffar, and A. El Abbadi, "Cyclic Allocation for Two-Dimensional Data," In Proc. of ICDE Conference, 1998, pp.94-101.
  15. S. Prabhakar, D. Agrawal, and A. E. Abbadi, "Disk Allocation for Fast Range and Nearest- Neighbor Queries," Distributed and Parallel Databases, Vol.14 No.2, 2003, pp.107-135. https://doi.org/10.1023/A:1024895525526
  16. S-W. Kuo, M. Winslett, Y. Cho, and J. Lee, "New GDM-based Declustering Methods for Parallel Range Queries," In Proc. of IDEAS Symposium, 1999, pp.119-127.
  17. M. J. Atallah and S.Prabhakar,"(Almost) Optimal Parallel Block Access for Range Queries," In Proc. of PODS Conference, 2000, pp.205-215.
  18. R. Bhatia, R.K. Sinha, and C.M. Chen, "Declustering Golden Ratio Sequences," In Proc. of ICDE Conference, 2000, pp.271-280.
  19. C. M. Chen and C. T. Cheng, "From Discrepancy to Declustering: Near optimal multidimensional declustering strategies for range queries," In Proc. of PODS Conference, 2002, pp.29-38.
  20. CM. Chen, R. Bhatia, and R.K. Sinha, "Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences," IEEE Trans. Knowledge and Data Engineering, Vol.15 No.3, 2003, pp.659-670. https://doi.org/10.1109/TKDE.2003.1198397
  21. B. Himatsingka and J. Srivastava, "Performance Evaluation of Grid Based Multi-Attribute Record Declustering Methods," In Proc. of ICDE Conference, 1994, pp.356-365.
  22. B.K. Moon and J.H. Saltz, "Scalability Analysis of Declustering Methods for Multidimensional Range Queries," IEEE Trans. Knowledge and Data Engineering, Vol.10 No.2, 1998, pp.310-327. https://doi.org/10.1109/69.683759
  23. Yuan Y. Sung, "Performance analysis of disk modulo allocation method for Cartisian product files," IEEE Trans. Software Eng, Vol.13 No.9, 1987, pp.1018-1026.
  24. K. Abdel-Ghaffar and A. E. Abbadi, "Optimal Allocation of Two-Dimensional Data," In Proc. of ICDT Conference, 1997, pp.409-418.
  25. A.S. Tosun and H. Ferhatosmanoglu, "Optimal parallel I/O using replications," In Proc. of ICPP Conference, 2002, pp. 506-513
  26. Y. Liu, S. Y. Sung, H. Xiong and P. A. Ng, "Data Declustering with Replications". In Proc. of DASFAA Conference, 2004, pp.682-693
  27. H. Ferhatosmanoglu, A. S. Tosun, A. Ramachandran, "Replicated Declustering of Spatial Data," In Proc. of PODS Conference, 2004, pp.125-135.
  28. A. S. Tosun,, "Threshold Based Declustering in High Dimensions," In Proc. of DEXA Conference, 2005, pp.818-827.
  29. A. S. Tosun, "Analysis and Comparison of Replicated Declustering Schemes," IEEE Trans. Parallel Distributed. System(TPDS), Vol. 18 No.11, 2007, pp.1578-1591 https://doi.org/10.1109/TPDS.2007.1082
  30. K. Yasin Oktay, A. Turk, C. Aykanat, "Selectivity Replicated Declustering for Arbitrary Queries," In Proc. of Euro-Par Conference, 2009, pp.375-386.
  31. B. Moon, A. Acharya, and J. H. Saltz, "Study of Scalable Declustering Algorithms for Parallel Grid Files," In Proc. of IPPS Symposium, 1996, pp.34-440.
  32. M. T. Fang, R.C.T. Lee, and C.C. Chang, "The Idea of De-Clustering and Its applications," In Proc. of VLDB Conference, 1986, pp.181-188.
  33. D.R. Liu and S. Shekhar, "Partitioning Similarity Graphs: A Framework for Declustering Problems," International Journal Information System, Vol.21 No.6, 1996, pp.475-496.
  34. D. R. Liu and M. Y. Wu, "A Hypergraph Based Approach to Declustering Problems," Distributedand Parallel Databases.,Vol.10 No.3, 2001, pp.269-288. https://doi.org/10.1023/A:1019269409432
  35. S.Berchtold, C. Bohm, B. Braunmuller, D.A. Keim, and H.-P. Kriegel, "Fast Parallel Similarity Search in Multimedia Databases," In Proc. of SIGMOD Conference, 1997, pp.1-12
  36. J. L. Johnson. Probability and Statistics for computer science, John Wiley & Sons, Inc., Hoboken, New Jersey, 2003