DOI QR코드

DOI QR Code

Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams

  • Kim, Young-Hee (School of Information and Communication Engineering, Sungkyunkwan University) ;
  • Kim, Won-Young (School of Information and Communication Engineering, Sungkyunkwan University) ;
  • Kim, Ung-Mo (School of Information and Communication Engineering, Sungkyunkwan University)
  • Published : 2010.03.31

Abstract

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Data mining over data streams should support the flexible trade-off between processing time and mining accuracy. In many application areas, mining frequent itemsets has been suggested to find important frequent itemsets by considering the weight of itemsets. In this paper, we present an efficient algorithm WSFI (Weighted Support Frequent Itemsets)-Mine with normalized weight over data streams. Moreover, we propose a novel tree structure, called the Weighted Support FP-Tree (WSFP-Tree), that stores compressed crucial information about frequent itemsets. Empirical results show that our algorithm outperforms comparative algorithms under the windowed streaming model.

Keywords

References

  1. M.M. Gaber, et al, "Mining data streams: a review", ACM SIGMOD record 34(2), pp.18-26, 2005. https://doi.org/10.1145/1083784.1083789
  2. J. Chang, W. Lee, "A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams", Journal of Information Science and Engineering, Vol.20, No.4, July, 2004.
  3. G.S. Manku, R. Motwani, "Approximate Frequency Counts Over Data Streams", In Proceedings of the 28th International Conference on Very Large Data Bases, pp.346-357, 2002.
  4. C.H. Lee, C.R. Lin, M.S. Chen, "Sliding window filtering: An efficient method for incremental mining on a time-variant database", Information Systems, 30, pp.227-244, 2005. https://doi.org/10.1016/j.is.2004.02.001
  5. H.F Li, S.Y. Lee, M.K. Shan, "An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams", In Proceedings of First International Workshop on Knowledge Discovery in Data Streams 9IWKDDS, 2004.
  6. H.F Li, S.Y. Lee, M.K. Shan, "Online Mining (Recently) Maximal Frequent Itemsets over Data Streams", In Proceedings of the 15th IEEE International Workshop on Research Issues on Data Engineering (RIDE), 2005. https://doi.org/10.1109/RIDE.2005.13
  7. H. Yao, H.J. Hamilton, C.J. Butz, "A Foundational Approach to Mining Itemset Utilities from Databases", In Proceedings of the 4th SIAM International Conference on Data Mining, Florida, USA, 2004.
  8. C.J Chu, V.S. Tseng, T. Liang, "An efficient algorithm for mining temporal high utility itemsets from data streams", The Journal of System and Software 81, pp.1105-1117, 2008. https://doi.org/10.1016/j.jss.2007.07.026
  9. C. Giannella, J, Han, J. Pei, X. Yan, P.S. Yu, "Mining Frequent Patterns in Data Streams at Multiple Time Granularities", Next Generation Data Mining, 2003.
  10. C.H. Cai, A.W. Fu, C.H. Cheng, W.W. Kwong, "Mining association rules with weighted items", In Proceedings of the International Database Engineering and Applications Symposium, IDEAS98, pp.68-77, Cardiff, Wales, UK, 1998. https://doi.org/10.1109/IDEAS.1998.694360
  11. F. Tao, "Weighted association rule mining using weighted support and significant framework", In Proceedings of the 9th ACM SIGKDD, Knowledge Discovery and Data Mining, pp.661-666, 2003. https://doi.org/10.1145/956750.956836
  12. W. Wang, J. Yang, P.S, Yu, "WAR: weighted association rules for item intensities", Knowledge Information and Systems, Vol.6, pp.203-229, 2004. https://doi.org/10.1007/s10115-003-0108-7
  13. R. Agrawal, R. Srikant, "Fast Algorithms for Mining Association Rules", In Proceedings of the 20th VLDB conference, pp.487-499, 1994.
  14. U. Yun, J.J. Leggett, "WFIM: weighted frequent itemset mining with a weight range and a minimum weight", In Proceedings of the 15th SIAM International Conference on Data Mining (SDM’'05), pp.636-640, 2005.
  15. U. Yun, "Efficient Mining of weighted interesting patterns with a strong weight and/or support affinity", Information Sciences, Vol.177, pp.3477-3499, 2007. https://doi.org/10.1016/j.ins.2007.03.018
  16. C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, "Efficient Mining of Weighted Frequent Patterns Over Data Streams", 2009 11th International Conference on High Performance Computing and Communications, pp.400-406, June, Seoul, Korea, 2009. https://doi.org/10.1109/HPCC.2009.36
  17. J. Han, J. Pei, Y. Yin, R. Mao, "Mining Frequents without Candidate Generation: A Frequent-Pattern Tree Approach", Data Mining and Knowledge Discovery, No.8, pp.53-87, 2004. https://doi.org/10.1023/B:DAMI.0000005258.31418.83

Cited by

  1. A New Data Stream Mining Algorithm for Interestingness-Rich Association Rules vol.53, pp.3, 2013, https://doi.org/10.1080/08874417.2013.11645628
  2. Efficient mining fuzzy association rules from ubiquitous data streams vol.54, pp.2, 2015, https://doi.org/10.1016/j.aej.2015.03.015
  3. Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints vol.16, pp.1, 2015, https://doi.org/10.7472/jksii.2015.16.1.67
  4. Discovering Frequent Itemsets Reflected User Characteristics Using Weighted Batch based on Data Stream vol.11, pp.1, 2011, https://doi.org/10.5392/JKCA.2011.11.1.056
  5. Driving behaviors analysis based on feature selection and statistical approach: a preliminary study pp.1573-0484, 2018, https://doi.org/10.1007/s11227-018-2618-9
  6. Strategies for data stream mining method applied in anomaly detection pp.1573-7543, 2018, https://doi.org/10.1007/s10586-018-2835-2