Processing Sliding Windows over Disordered Streams

비순서화된 스트림 처리를 위한 슬라이딩 윈도우 기법

  • 김현규 (한국과학기술원 전산학과) ;
  • 김철기 (한국정보통신대학교 모바일 멀티미디어연구소) ;
  • 김명호 (한국과학기술원 전산학과)
  • Published : 2006.11.15

Abstract

Disordered streams cause two issues in processing sliding windows: i) how to place input tuples into a buffer in an increasing order efficiently and ii) how to determine a time point to process the windows from input tuples in the buffer. To address these issues, we propose a structure and method of operators for processing sliding windows. We first present a structure of the operators using an index to handle input tuples efficiently. Then, we propose a method to determine the time point to process the windows, which is called a mean-based estimation. In the proposed method, users can describe parameters required for estimation in a query specification, which provides a way for users to control the properties of query results such as the accuracy or the response time according to application requirements. Our experimental results show that the mean-based estimation provides better adaptivity and stability than the one used in the existing method.

비순서화된 스트림은 슬라이딩 윈도우의 생성에 있어서 두 가지 문제점을 야기한다. 첫째는 스트림을 효율적으로 정렬하는 문제이며, 둘째는 정렬된 스트림으로부터 윈도우를 언제 생성할지 결정하는 문제이다. 본 논문에서는 이러한 문제를 해결하기 위한 윈도우 오퍼레이터의 구조와 방법에 대해 제안한다. 먼저 입력 튜플을 효율적으로 정렬하고 저장하기 위해 인덱스를 이용한 오퍼레이터의 구조를 소개한다. 그리고 윈도우의 생성 시점을 결정하기 위한 평균-기반 추정 방식을 제안한다. 제안하는 기법에서는 추정에 필요한 매개변수를 질외문에서 정의할 수 있으며, 이를 통해 사용자가 어플리케이션의 요구사항에 따라 정확성이나 응답 시간과 같은 질의 결과의 특성을 조절할 수 있도록 지원한다. 본 논문의 실험 결과는 제안한 평균-기반 방식이 기존의 연구에서 이용한 방식보다 적응성과 안정성이 우수하다는 것을 보인다.

Keywords

References

  1. Douglas Terry, David Goldberg, David Nichols, and Brian Oki, Continuous Queries over Append Only Databases. ACM SIGMOD, 1992 https://doi.org/10.1145/130283.130333
  2. Samuel R. Madden, Mehul A. Shah, Joseph M. Hellerstein and Vijayshankar Raman, Continuously Adaptive Continuous Queries over Streams. ACM SIGMOD Conference, Madison, WI, June 2002 https://doi.org/10.1145/564691.564698
  3. S. Babu and J Widom, Continuous Queries over Data Streams. ACM SIGMOD Record, Sep. 2001 https://doi.org/10.1145/603867.603884
  4. Rajeev Motwani et al, Query Proessing, Resource Management, and Approximation in a Data Stream Management System. CIDR 2003, Jan. 2003
  5. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, Models and Issues in Data ,Stream Systems. Invited paper in Proc. of the 2002 ACM Symp, on Principles of Database Systems' (PODS 2002), June 2002 https://doi.org/10.1145/543613.543615
  6. Arvind Arasu et al, STREAM: The Stanford Data Stream Management System. IEEE Data Engineering Bulletin, Vol. 26 No. 1, March 2003
  7. Sirish Chandrasekaran et ai, TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. CIDR 2003 https://doi.org/10.1145/872757.872857
  8. D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal (2)2: 120-139, August 2003 https://doi.org/10.1007/s00778-003-0095-z
  9. D. Abadi at al, The Design of the Borealis Stream Processing Engine. CIDR 2005, Asilomar, CA, January 2005
  10. Jin Li, David Maier, Kristin Tufte, Vassilis Papadirnos, Peter A. Tucker, Semantics and Evaluation Techniques for Window Aggregates in Data Streams. ACM SIGMOD 2005, June 14-16, 2005, Baltimore, Maryland, USA https://doi.org/10.1145/1066157.1066193
  11. Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker, No Pane, No Gain: Efficient Evaluation of Sliding' Window Aggregates over Data Streams. SIGMOD Record, Vol 34, No.1, March 2005 https://doi.org/10.1145/1058150.1058158
  12. Peter A. Tucker, David Maier, Time Sheard, Leonidas Fegaras, Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Transactions on Knowledge and Data Engineering, May/June 2003 https://doi.org/10.1109/TKDE.2003.1198390
  13. David Maier, Jin Li, Peter A. Tucker, Kristin Tufte and Vassilis Papadimos, Semantics of Data Streams and Operators. ICDT 2005, LNCS 3363, pp.37-52, 2005
  14. J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. ACM SIGMOD pages 379-390, May 2000 https://doi.org/10.1145/335191.335432
  15. U. Srivastava and J. Widom. Flexible Time Management in Data Stream Systems. ACM PODS 2004, June 2004 https://doi.org/10.1145/1055558.1055596
  16. S. Babu, U. Srivastava and J. Widom, Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS, Sep. 2004 https://doi.org/10.1145/1016028.1016032
  17. Chuck Cranor, Theodore Johnson, Oliver Spataschek and Vladislav Shkapenyuk, Gigascope: A Stream Database for Network Applications. ACM SIGMOD, June 9-12 2003 https://doi.org/10.1145/872757.872838
  18. A. Arasu, S. Babu and J. Widom, The CQL Continuous Query Language: Semantic Foundations and Query Execution. Stanford University Technical Report, Oct. 2003
  19. Hyeon Gyu Kim, Cheolgi Kim and Myoung Ho Kim, Adaptive Disorder Control in Continuous Data Streams, IEEE CIT, September 2006 https://doi.org/10.1109/CIT.2006.33
  20. TinyDB: http://www.tinyos.net
  21. SENSIM: http://csc.lsu.edu/sensor_web/simulator.html
  22. NS2 Sensor Network Extension: http://pf.itd.nrl.navy.mil/nrlsensorsim