A Spatial Hash Strip Join Algorithm for Effective Handling of Skewed Data

편중 데이타의 효율적인 처리를 위한 공간 해쉬 스트립 조인 알고리즘

  • 심영복 (충북대학교 컴퓨터교육과) ;
  • 이종연 (충북대학교 컴퓨터교육과)
  • Published : 2005.10.01

Abstract

In this paper, we focus on the filtering step of candidate objects for spatial join operations on the input tables that none of the inputs is indexed. Over the last decade, several spatial Join algorithms for the input tables with index have been extensively studied. Those algorithms show excellent performance over most spatial data, while little research on solving the performance degradation in the presence of skewed data has been attempted. Therefore, we propose a spatial hash strip join(SHSJ) algorithm that can refine the problem of skewed data in the conventional spatial hash Join(SHJ) algorithm. The basic idea is similar to the conventional SHJ algorithm, but the differences are that bucket capacities are not limited while allocating data into buckets and SSSJ algorithm is applied to bucket join operations. Finally, as a result of experiment using Tiger/line data set, the performance of the spatial hash strip join operation was improved over existing SHJ algorithm and SSSJ algorithm.

이 논문은 공간 조인연산 시 인덱스가 존재하지 않는 두 입력 테이블에 대한 후보 객체들의 여과 단계 처리이다 이 분야에 대한 기존 알고리즘들은 대개 공간 데이타의 조인 연산에서는 우수한 성능을 나타내고 있지만 입력 테이블에 객체들이 편중되어 있을 경우 성능이 저하되는 문제를 가지고 있으며, 이러한 단점을 보완할 수 있는 방법에 대한 연구는 미흡한 상태이다 따라서, 이 논문에서는 인덱스가 존재하지 않는 두 입력 테이블의 편중된 객체에 대한 문제를 해결하기 위해 기존 연구인 Spatial Hash Join 알고리즘을 개선한 Spatial Hash Strip loin 알고리즘을 제안한다. SHSJ 알고리즘과 기존 SHJ 알고리즘의 차이점은 입력 데이타 집합을 버킷에 할당 시 버킷 용량에 제한을 두지 않는다는 점과 버킷의 조인 단계에서 SSSJ 알고리즘을 사용한다는 것이다. 제안한 SHSJ 알고리즘의 성능 평가를 위해 Tiger/line 데이타를 사용하여 평가한 결과 인덱스가 존재하지 않으며 편중 분포를 갖는 입력 테이블에 대한 공간 조인 연산의 성능이 기존 SHJ와 SSSJ 알고리즘보다 우수함이 검증되었다.

Keywords

References

  1. M. L. Lo and C. V. Ravishankar, 'Spatial Hash-Joins,' In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 209-220, May 1996 https://doi.org/10.1145/235968.233337
  2. L. Arge, O. Procopiuc, S. Ramaswami, T. Suel, and J. Vitter, 'Scalable Sweeping Based Spatial Join,' In Proceedings of International Conference on Very Large Data Bases, pp. 570-581, Aug. 1998
  3. A. Guttman, 'R-Trees: A Dynamic Index Structure for Spatial Searching,' In Proceedings of ACM SIGMOD International Conference on Management of Data, pp.47-57, Jun., 1984 https://doi.org/10.1145/602259.602266
  4. L. Becker, K. Hinrichs, and U. Finke, 'A New Algorithm for Computing of Spatial Joins Using R-trees,' In Proceedings of the Ninth International Conference on Data Engineering, pp. 190-197, Vienna, Austria, Apr. 1993
  5. T. Brinkhoff, H. Kriegel, R. Schneider, and B. Seeger, 'Multi-Step Processing of Spatial Joins,' In Proceedings of ACM SIGMOD International Conference on Management of Data, pp.197-208, Jun., 1994 https://doi.org/10.1145/191839.191880
  6. R. Elmasri and S. B. Navathe, Fundamental of Database systems, 3rd edition, Addison-Wesley Publishers, pp. 594-600, 2000
  7. M. L. La and C. V. Ravishankar, 'Spatial joins using seeded trees,' In Proceedings of ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, pp. 209-220, May, 1994 https://doi.org/10.1145/191839.191881
  8. M. L. Lo and C. V. Ravishankar, 'Generating seeded trees from data sets,' In the Fourth International Symposium on Large Spatial Databases (Advances in Spatial Databases: SSD '95), Portland, Maine, pp. 328-347, Aug. 1995
  9. N. Mamoulis and D. Papadias, 'Slot Index Spatial Join' IEEE Transactions on Knowledge and Data Engineering, Vol.15, No.1, Jan/Feb., 2003 https://doi.org/10.1109/TKDE.2003.1161591
  10. M. L. Lo and C. V. Ravishankar, 'The Design and Implementation of Seeded Trees: An Efficient Method for Spatial Joins,' IEEE Transactions on Knowledge and Data Engineering, Vo1.10, No.1, pp.136-151, 1998 https://doi.org/10.1109/69.667097
  11. J. M. Patel and D. J. DeWitt, 'Partition Based Spatial-Merge Join,' In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 259-270, Jun. 1996 https://doi.org/10.1145/233269.233338
  12. N. Koudas and K. Sevcik, 'Size Separation Spatial Join,' In Proceedings of ACM SIGMOD International Conference Management of Data, pp. 324-335, May 1997 https://doi.org/10.1145/253262.253340
  13. R. H. Buting and W. Schilling, 'A Practical Divide-and-Conquer Algorithm for the Rectangle Intersection Problem,' Information Sciences, Vol. 42, No. 2, pp. 95-112, July 1987 https://doi.org/10.1016/0020-0255(87)90018-1
  14. S. T. Leutenegger, J. Edgington, and M. A. Lopez, 'STR: A Simple and Efficient Algorithm for R-Tree Packing,' In Proceedings of International Conference on Data Engineering, pp.497-506, Apr., 1997 https://doi.org/10.1109/ICDE.1997.582015
  15. U. S, Bureau of the Census, '2002 Tiger/line Files,' 2002