A PCA-based Data Stream Reduction Scheme for Sensor Networks

센서 네트워크를 위한 PCA 기반의 데이터 스트림 감소 기법

  • Published : 2009.08.30

Abstract

The emerging notion of data stream has brought many new challenges to the research communities as a consequence of its conceptual difference with conventional concepts of just data. One typical example is data stream processing in sensor networks. The range of data processing considerations in a sensor network is very wide, from physical resource restrictions such as bandwidth, energy, and memory to the peculiarities of query processing including continuous and specific types of queries. In this paper, as one of the physical constraints in data stream processing, we consider the problem of limited memory and propose a new scheme for data stream reduction based on the Principal Component Analysis (PCA) technique. PCA can transform a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables. We adapt PCA for the data stream of a sensor network assuming the cooperation of a query engine (or application) with a network base station. Our method exploits the spatio-temporal correlation among multiple measurements from different sensors. Finally, we present a new framework for data processing and describe a number of experiments under this framework. We compare our scheme with the wavelet transform and observe the effect of time stamps on the compression ratio. We report on some of the results.

데이터 스트림이란 새로운 개념과 기존의 단순 데이터 사이에 존재하는 개념적 차이를 극복하기 위해서는 많은 연구가 필요하다. 대표적인 예로써 센서 네크워크에서의 데이터 스트림 처리를 들 수 있는 데, 이를 위해서는 대역폭이나 에너지, 메모리와 같은 자원적 한계에서 부터 연속 질의를 포함하는 질의처리의 특수성까지 고려해야 할 대상이 광범위하다. 본 논문에서는 데이터 스트림 처리에서의 물리적 제약사항에 해당하는 한정된 메모리 문제를 해결하기 위해 PCA 기법을 기반으로 하는 데이터 스트림 축소 방안을 제안하다. PCA는 상호 관련된 다수의 변수들을 관련이 없는 적은 수의 변수로 변환해준다. 본 논문에서는 질의 처리 엔진의 협력을 가정하고서 센서 네크워크의 스트림 데이터 처리를 위해 PCA 기법을 적용하며, 다른 센서로부터 얻어진 많은 측정값 사이에 시공간적 관련성을 이용한다. 최종적으로 그러한 데이터 처리를 위한 프레임워크를 제시하고 다양한 실험을 통하여 기법의 성능을 분석한다.

Keywords

References

  1. M. Weiser, "The Computer for the 21st Century", Scientific Am.,; reprinted in IEEE Pervasive Computing, pp. 19-25, 2002.
  2. D. Sahs, and A. Mukherjee, "Pervasive Computing: A Paradigm for the 21st Century", IEEE Computer Society, March 2003.
  3. I. F. Akyildiz, W. Su, Y. Sankarasubramanian, and E. Cayirci, "A Survey on Sensor Networks," IEEE Communication Magazine, August 2002.
  4. F.L. Lewis, "Wireless Sensor Networks," Smart Environment: Technologies, Protocols, and Applications, New York, 2004.
  5. B. Brumitt, B. Meyers, J. Krumm, A. Kern, and S. Shafer, "EasyLiving: Technologies for Intelligent Environments", HUC 2000, LNCS 1927, pp. 12-29, 2000.
  6. J. Cho, and E. Hwang, "An Exhibition Reminiscent System for Ubiquitous Environ-ment," Proc. of Int'l Conf. on Computer and Information Technology, Sept. 2006.
  7. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and issues in data stream systems," PODS, 2002.
  8. A. Arasu, B. Badcock, S. Babu, J. McAlisher, and J. Windom, "Characterizing mem-ory requirements for queries over continuous data streams", Proc. Of ACM Symp. on Principles of Database Systems, June 2002.
  9. M. Datar, et. al, "Maintaining stream statistics over sliding windows," Proc. of Annual ACM-SIAM Symp. on Discrete Algorithms, 2002.
  10. S. Chaudhuri, R. Motwani, and V. Narasayya, "On random sampling over joins," Proc. of ACM SIGMOD, June 1999.
  11. J. Vitter, "External memory algorithm and datastructures", In J. Abello, editor, External Memory Algorithms, Dimacs, 1999.
  12. P. Indyk, "Stable distributions, pseudorandom generators, embeddings and data stream computation," Proc. of Annual IEEE Symp. on Foundations of Computer Science, 2000.
  13. A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "Fast, small-space algorithms for approximate histogram maintenance", Proc. of the Annual ACM Symp. on Theory of Computing, 2002.
  14. K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim, "Approximate query processing using wavelets," Proc. of Conf. on VLDB, Sept. 2000.
  15. A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss, "Fast, small-space algorithms for approximate histogram maintenance," Proc. of the Annual ACM Symp. on Theory of Computing, 2002.
  16. A. Gilbert, et al, "Surfing wavelets on streams: One-pass summaries for approximate aggregate queries", Proc. of Conf. on VLDB, 2001.
  17. A. Deligiannakis, Y. Kotidis, and N. Roussopoulos, "Compressing historical infor-mation in sensor networks," ACM SIGMOD, 2004.
  18. J. Xie, J. Yang, and Y. Chen, "On Joining and Caching Stochastic Streams," ACM SIGMOD June, Baltimore, MD, 2005.
  19. H. Liu, S. Hwang, and J. Srivastava, "Probabilistic Stream Relational Algebra: A Data Model for Sensor Data Streams", TR, July 12, 2004.
  20. D. Chu, A. Deshpande, J. M. Hellerstein, and W. Hong, "Approximate Data Collec-tion in Sensor Networks Using Probabilistic Models," ICDE 2006.
  21. A. Deshpande, C. Guestrin, S. R. Madden, "Using Probabilistic Models for Data Management in Acquisitioned Environments," Proc. of the 2005 CIDR Conference.
  22. R. Vidal, Y. Ma, and S. Sastry, "Generalized Principal Component Analysis," Proc. of Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 621-628, 2003.
  23. A. Fedoseev and E. Hwang, "Data Stream Approximation Using Principal Component Analysis for Sensor Network", Int’l Conference on Convergence Information Technology, Nov. 2007.
  24. http://www.k12.atmos.washington.edu/k12/gr-ayskies
  25. http://finance.yahoo.com/q/hp?s=GE