Continuous Query Processing in Data Streams Using Duality of Data and Queries

데이타와 질의의 이원성을 이용한 데이타스트림에서의 연속질의 처리

  • 임효상 (한국과학기술원 전산학과) ;
  • 이재길 (한국과학기술원 전산학과) ;
  • 이민재 ((주)네오위즈 연구소) ;
  • 황규영 (한국과학기술원 전산학과)
  • Published : 2006.06.01

Abstract

In this paper, we deal with a method of efficiently processing continuous queries in a data stream environment. We classify previous query processing methods into two dual categories - data-initiative and query-initiative - depending on whether query processing is initiated by selecting a data element or a query. This classification stems from the fact that data and queries have been treated asymmetrically. For processing continuous queries, only data-initiative methods have traditionally been employed, and thus, the performance gain that could be obtained by query-initiative methods has been overlooked. To solve this problem, we focus on an observation that data and queries can be treated symmetrically. In this paper, we propose the duality model of data and queries and, based on this model, present a new viewpoint of transforming the continuous query processing problem to a multi-dimensional spatial join problem. We also present a continuous query processing algorithm based on spatial join, named Spatial Join CQ. Spatial Join CQ processes continuous queries by finding the pairs of overlapping regions from a set of data elements and a set of queries defined as regions in the multi-dimensional space. The algorithm achieves the effects of both of the two dual methods by using the spatial join, which is a symmetric operation. Experimental results show that the proposed algorithm outperforms earlier methods by up to 36 times for simple selection continuous queries and by up to 7 times for sliding window join continuous queries.

본 논문은 데이타스트림 환경에서 연속질의를 효율적으로 처리하는 방법을 다룬다. 먼저, 기존의 질의 처리 방법을 데이타 엘리먼트와 질의 중에서 어느 것을 먼저 선택하고 수행을 시작하느냐에 따라서, 서로 이원적인 두 가지 방법인 데이타-이니셔티브(data-initiative)와 질의-이니셔티브(query-initiative)로 분류한다. 이러한 분류는 기존의 질의 처리 연구에서 데이타와 질의를 서로 다르게(asymmetrically) 취급하였다는 것에 기인한다. 기존의 연속질의 처리에서는 이원적인 질의 처리 방법 중에서 데이타-이니셔티브 방법만이 사용되었기 때문에, 질의-이니셔티브 방법에서 얻을 수 있는 성능 상의 이점이 간과되었다. 이러한 문제를 해결하기 위해, 데이타와 질의를 동등하게(symmetrically) 볼 수 있다는 점에 착안한다. 본 논문에서는 데이타와 질의의 이원성 모델(Duality Model of Data and Queries)을 제안하고 이 모델에 기반하여 연속질의 처리 문제를 다차원 공간에서의 공간조인 문제로 변환하는 새로운 관점을 제시한다. 그리고, 공간조인 기반 연속질의 처리 알고리즘인 Spatial Join CQ를 제안한다. Spatial Join CQ는 다차원 공간상에 영역으로 표현된 데이타 엘리먼트들의 집합과 질의들의 집합으로부터 서로 겹치는 쌍을 찾음으로써 연속질의를 처리한다. 제안하는 알고리즘은 대칭적인(symmetric) 연산인 공간조인으로 겹치는 영역들을 찾아냄으로써 서로 이원적인 두 가지 질의 처리 방법의 효과를 동시에 얻는다. 성능 평가 결과, 제시하는 알고리즘은 기존의 방법에 비해서 단순 선택 연속질의는 최대 36배, 슬라이딩 윈도우 조인 연속질의는 최대 7배의 성능 향상을 보였다.

Keywords

References

  1. Babcock, B. et aI., 'Models and Issues in Data Stream Systems,' In Proc. the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems(PODS), Madison, Wisconsin, pp. 1-16, June 2002 https://doi.org/10.1145/543613.543615
  2. Berchtold, S., Bohm, C., and Kriegel, H.-P., 'The Pyramid-Technique: Towards Breaking the Curse of Dimensionality,' In Proc. Int'l Conf. on Management of Data, ACM SIGMQD, Seattle, Washington, pp. 142-153, June 1998 https://doi.org/10.1145/276304.276318
  3. Brinkhoff, T., Kriegel, H.-P., and Seeger, B., 'Efficient Processing of Spatial Join Using R-trees,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Washington, DC., pp. 237-246, May 1993 https://doi.org/10.1145/170036.170075
  4. Chandrasekaran, S. et aI., 'TelegraphCQ: Continuous Dataflow Processing for an Uncertain World,' In Proc. the First Biennial Conf. on Innovative Data Systems Research, Asiloma, Califonia, pp. 269-280, Jan. 2003
  5. Chen, J. et al., 'NiagaraCQ: A Scalable Continuous Query System for Internet Databases,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Dallas, Texas, pp. 379-390, June 2000 https://doi.org/10.1145/342009.335432
  6. Faloutsos, C. and Roseman, S., 'Fractals for Secondary Key Retrieval,' In Proc. the Eighth ACM SIGACT-SIGMOD Symp. on Principles of Database Systems(PODS), Philadelphia, Pennsylvania, pp. 247-252, Mar. 1989 https://doi.org/10.1145/73721.73746
  7. Finkel, R. A. and Bentley, J. L., 'Quad-trees: A Data Structure for Retrieval on Composite Keys,' ACTA Informatica, Vol. 4, No. 1, pp. 1-9, 1974 https://doi.org/10.1007/BF00288933
  8. Fox, E. A. et aI., 'Order-preserving minimal perfect hash functions and information retrieval,' ACM Trans. on Information Systems, Vol.9, No.3, pp. 281-308, July 1991 https://doi.org/10.1145/125187.125200
  9. Golab, L. and Ozsu, M. T., 'Issues in Data Stream Management,' ACM SIGMOD Record, Vol. 32, No.2, pp. 5-14, June 2003 https://doi.org/10.1145/776985.776986
  10. Guttman, A., 'R-trees: a Dynamic Index Structure for Spatial Searching,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Boston, Massachusetts, pp. 47-57, June 1984 https://doi.org/10.1145/602259.602266
  11. Hanson, E. N. et aI., 'A Predicate Matching Algorithm for Database Rule Systems,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 271-280, June 1990 https://doi.org/10.1145/93605.98736
  12. Hinrichs, K. and Nievergelt, J., 'The Grid File: A Data Structure Designed to Support Proximity Queries on Spatial Objects,' In Proc. Int'l Work?shop on Graphtheoretic Concepts in Computer Science, Linz, Austria, pp.100-113, Aug. 1983
  13. Huang, Y.-W., Jing, N., and Rundensteiner, E. A., 'Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations,' In Proc. the 23rd Int'l Conf. on Very Large Data Bases, Athens, Greece, pp.396-405, Aug. 1997
  14. Kang, J., Naughton, J. F., and Viglas, S. D., 'Evaluating Window Joins over Unbounded Streams,' In Proc. the 19th IEEE Int'l Conf. on Data Engineering(ICDE), Bangalore, India, pp. 341-352, Mar. 2003
  15. Kriegel, H.-P. et aI., 'Spatial Query Processing for High Resolutions,' In Proc. the Eighth Int'l Conf. on Database Systems for Advanced Applications, Tokyo, Japan, pp. 17-26, Mar. 2003 https://doi.org/10.1109/DASFAA.2003.1192364
  16. Motwani, R. et al., 'Query Processing, Approximation, and Resource Management in a Data Stream Management System,' In Proc. the First Biennial Conf. on Innovative Data Systems Research, Asiloma, California, pp. 245-256, Jan. 2003
  17. Orenstein, J. A. and Merrett, T. H., 'A Class of Data Structures for Associative Searching,' In Proc. the Third ACM SIGACT-SIGMOD Symp. on Principles of Database Systems(PODS), Waterloo, Canada, pp. 181-190, Apr. 1984 https://doi.org/10.1145/588011.588037
  18. Seeger, B. and Kriegel, H.-P., 'Techniques for Design and Implementation of Efficient Spatial Access Methods,' In Proc. the 14th Int'l Conf. on Very Large Data Bases, Los Angeles, California, pp.360-371, Aug. 1988
  19. Seeger, B. and Kriegel, H.-P., 'The Buddy-tree: An Efficient and Robust Access Method for Spatial Database Systems,' In Proc. the 16th Int'l Conf. on Very Large Data Bases, Queensland, Australia, pp.590-601, Aug. 1990
  20. Song, J.-W., Whang, K,-Y., Lee, Y.-K., and Kim, S.-W, 'Spatial Join Processing Using Corner Transformation,' IEEE Trans. on Knowledge and Data Engineering, Vol. 11, No.4, July 1999 https://doi.org/10.1109/69.790844
  21. Terry, D. et al., 'Continuous Queries over Append-Only Databases,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, San Diego, California, pp. 321-330, June 1992 https://doi.org/10.1145/130283.130333
  22. Weber, R., Schek, H.-J., and Blott, S., 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' In Proc. the 24th Int'l Conf. on Very Large Data Bases, New York City, New York, pp.194-205, Aug. 1998
  23. Whang, K.-Y. and Krishnamurthy, R., Multilevel Grid Files, IBM Research Report RC11516, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, Nov. 1985
  24. Whang, K.-Y. and Krishnamurthy, R., 'The Multi-?level Grid File - a Dynamic Hierarchical Multidimensional File Structure,' In Proc. Int'l Conf. on Database Systems for Advanced Applications, Tokyo, Japan, pp. 449-459, Apr. 1991
  25. Zdonik, S. et aI., 'The Aurora and Medusa Projects,' IEEE Data Engineering Bulletin, Vol. 26, No.1, pp. 3-10, Mar. 2003