DOI QR코드

DOI QR Code

Finding Pseudo Periods over Data Streams based on Multiple Hash Functions

다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법

  • 이학주 (연세대학교 일반대학원 컴퓨터과학과) ;
  • 김재완 (연세대학교 일반대학원 컴퓨터과학과) ;
  • 이원석 (연세대학교 컴퓨터과학과)
  • Received : 2016.09.12
  • Accepted : 2017.02.06
  • Published : 2017.03.31

Abstract

Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.

Acknowledgement

Grant : 빅데이터 환경에서 비식별화 기법을 이용한 개인정보보호 기술 개발

Supported by : 정보통신기술진흥센터

References

  1. Chang, J.H. and W.S. Lee, "Finding Recently Frequent Itemsets Adaptively over Online Transactional Data Streams", Information Systems, Vol.31, No.8, 2006, 849-869. https://doi.org/10.1016/j.is.2005.04.001
  2. Cho, O.W., "Continuous Computing based OLAP Analysis for Multi-dimension Data Streams", Yonsei Grad. Univ., Dept. of CS, 2012. (조오욱, "다차원 데이터 스트림을 위한 연속 처리 기반의 OLAP 분석 기법", 연세대학교 대학원 : 컴퓨터과학과, 2012.)
  3. Ding, S., F. Wu, J. Qian, H. Jia, and F. Jin, "Research on Data Stream Clustering Algorithms", Artificial Intelligence Review, Vol. 43, No.4, 2015, 593-600. https://doi.org/10.1007/s10462-013-9398-7
  4. Elfeky, M.G., W.G. Areg, and A.K. Elmagarmid, "Periodicity Detection in Time Series Databases", IEEE Transactions on Knowledge and Data Engineering, Vol.17, No.7, 2005, 875-887. https://doi.org/10.1109/TKDE.2005.114
  5. Guan, T., K.R. Wang, and S.P. Zhang, "A Robust Periodicity Mining Method from Incomplete and Noisy Observations based on Relative Entropy", International Journal of Machine Learning and Cybernetics, Vol.8, Issue.1, 2015, 283-293.
  6. Jin, C., W. Qian, C. Sha, J.X. Yu, and A. Zhou, "Dynamically Maintaining Frequent Items over A Data Stream", CIKM '03 Proceedings of the Twelfth International Conference on Information and Knowledge Management, 2003, 287-294.
  7. Parthasarathy, S., S. Mehta, and S. Sriviasan, "Robust Periodicity Detection Algorithms", CIKM '06 Proceedings of the 15th ACM International Conference on Information and knowledge Management, 2006, 874-875.
  8. Tang, L., B. Cui, H. Li, G. Miao, D. Yang, and X. Zhou, "Effective Variation Management for Pseudo Periodical Streams", Sigmod '07 Proceedings of the 2007 ACM Sigmod International Conference on Management of Data, 2007, 257-268.
  9. Tao, Y. and M.T. Ozsu, "Mining Data Streams with Periodically Changing Distributions", Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009, 887-896.