Adaptive Buffer Control over Disordered Streams

비순서화된 스트림 처리를 위한 적응적 버퍼 제어 기법

  • 김현규 (한국과학기술원 전산학과) ;
  • 김철기 (한국정보통신대학교 전산학과) ;
  • 이충호 (한국전자통신연구원 텔레메틱스.USN연구단) ;
  • 김명호 (한국과학기술원 전산학과)
  • Published : 2007.10.15

Abstract

Disordered streams may cause inaccurate or delayed results in window-based queries. Existing approaches usually leverage buffers to hand]e the streams. However, most of the approaches estimate the buffer size simply based on the maximum network delay in the streams, which tends to over-estimate the buffer size and result in high latency. In this paper, we propose a probabilistic approach to estimate the buffer size adaptively according to the fluctuated network delays. We first assume that intervals of tuple generations follow an exponential distribution and network delays have a normal distribution. Then, we derive an estimation function from the assumptions. The function takes a drop ratio as an input parameter, which denotes a percentage of tuple drops permissible during query execution. By describing the drop ratio in a query specification, users can control the quality of query results such as accuracy or latency according to application requirements. Our experimental results show that the proposed function has better adaptivity than the existing function based on the maximum network delay.

비순서화된 스트림은 윈도우 기반의 질의를 처리할 때 부정확하거나 지연된 결과를 유발할 수 있다. 기존의 방식에서는 일반적으로 버퍼를 이용하여 비순서화된 스트림을 정렬하며, 버퍼의 크기를 추정하기 위해 네트워크 지연의 최대값에 기반한 방식을 이용한다. 그러나 이러한 방식은 버퍼의 크기를 불필요하게 큰 값으로 추정할 수 있으며, 지연된 질의 결과를 발생시킬 수 있다. 본 논문에서는 네트워크 지연의 변화에 따라 적응적으로 버퍼의 크기를 추정하기 위한 확률론적인 접근 방법을 제안한다 제안하는 방법에서는 튜플의 생성이 포아송 분포를 따르며 네트워크 지연은 정규 분포를 따른다고 가정한다. 그리고 이러한 가정을 바탕으로 추정식을 유도한다. 추정식은 튜플의 손실율을 입력인자로 요구하며, 이는 실시간에 튜플의 손실에 있어서 허용 가능한 백분율을 나타낸다. 사용자는 손실율을 질의문에서 정의함으로써, 응용의 요구에 따라 질의 결과의 정확성이나 처리속도 중 원하는 특성에 중점을 둘 수 있다. 본 논문의 실험 결과는 제안한 추정식이 기존의 네트워크 지연의 최대값에 기반한 추정식에 비해 적응성이 우수함을 보인다.

Keywords

References

  1. Douglas Terry, David Goldberg, David Nichols, and Brian Oki, Continuous Queries over Append-Only Databases. ACM SIGMOD, 1992
  2. Samuel R. Madden, Mehul A. Shah, Joseph M. Hellerstein and Vijayshankar Raman, Continuously Adaptive Continuous Queries over Streams. ACM SIGMOD Conference, Madison, WI, June 2002
  3. S. Babu and J. Widom, Continuous Queries over Data Streams. ACM SIGMOD Record, Sep. 2001
  4. Rajeev Motwani et al, Query Proessing, Resource Management, and Approximation in a Data Stream Management System. CIDR 2003, Jan. 2003
  5. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, Models and Issues in Data Stream Systems. Invited paper in Proc. of the 2002 ACM Symp. on Principles of Database Systems (PODS 2002), June 2002
  6. Arvind Arasu et al, STREAM: The Stanford Data Stream Management System. IEEE Data Engineering Bulletin, Vol. 26 No. 1, March 2003
  7. Sirish Chandrasekaran et al, TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. CIDR 2003
  8. D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal (12)2: 120-139, August 2003
  9. D. Abadi at al, The Design of the Borealis Stream Processing Engine. CIDR 2005, Asilomar, CA, January 2005
  10. Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker, Semantics and Evaluation Techniques for Window Aggregates in Data Streams. ACM SIGMOD 2005, June 14-16, 2005, Baltimore, Maryland, USA
  11. Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker, No Pane, No Gain: Efficient Evaluation of Sliding Window Aggregates over Data Streams. SIGMOD Record, Vol 34, No. 1, March 2005
  12. A. Arasu, S. Babu and J. Widom, The CQL Continuous Query Language: Semantic Foundations and Query Execution. Stanford University Technical Report, Oct. 2003
  13. U. Srivastava and J. Widom. Flexible Time Management in Data Stream Systems. ACM PODS 2004, June 2004
  14. S. Babu, U. Srivastava and J. Widom, Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS, Sep. 2004
  15. J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. ACM SIGMOD pages 379-390, May 2000
  16. Chuck Cranor, Theodore Johnson, Oliver Spataschek and Vladislav Shkapenyuk, Gigascope: A Stream Database for Network Applications. ACM SIGMOD, June 9-12 2003
  17. Peter A. Tucker, David Maier, Time Sheard, Leonidas Fegaras, Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Transactions on Knowledge and Data Engineering, May/June 2003
  18. David Maier, Jin Li, Peter A. Tucker, Kristin Tufte and Vassilis Papadimos, Semantics of Data Streams and Operators. ICDT 2005, LNCS 3363, pp.37-52, 2005
  19. Lukasz Golab, Shaveen Garg, and M.Tamer Ozsu, On Indexing Sliding Windows over Online Data Streams, EDBT 2004, LNCS 2992, pp.712-729, 2004
  20. Dimitry P. Bertsekas and John N. Tsitsiklis, Introduction to Probability: International Edition, Athena Scientific, Belmont, Massachusetts, 2002
  21. TinyDB: http://www.tinyos.net
  22. SENSIM: http://csc.lsu.edu/sensor_web/simulator.html
  23. NS2 Sensor Network Extension: http://pf.itd.nrl.navy.mil/nrlsensorsim