Approximate Clustering on Data Streams Using Discrete Cosine Transform

  • Yu, Feng ;
  • Oyana, Damalie ;
  • Hou, Wen-Chi ;
  • Wainer, Michael
  • Published : 2010.03.31


In this study, a clustering algorithm that uses DCT transformed data is presented. The algorithm is a grid density-based clustering algorithm that can identify clusters of arbitrary shape. Streaming data are transformed and reconstructed as needed for clustering. Experimental results show that DCT is able to approximate a data distribution efficiently using only a small number of coefficients and preserve the clusters well. The grid based clustering algorithm works well with DCT transformed data, demonstrating the viability of DCT for data stream clustering applications.


Grid Density-Based Clustering;Approximate Cluster Analysis;Discrete Cosine Transform;Sampling;Data Reconstruction;Data Compression


  1. C. C. Aggarwal, J. Han, J. Wang, P. Yu, "A Frame-work for Projected Clustering of High Dimension Data Streams," VLDB Conference, 2004.
  2. C. C. Aggarwal, J. Han, J. Wang, P. Yu, "A Frame-work for Clustering Evolving Data Streams," VLDB Conference, 2003.
  3. M. Ester, H. P. Kriegel., J. Sander, X. Xu.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR, pp.226-231, 1996. (DBSCAN)
  4. D. Fisher, "Iterative Optimization and Simplification of Hierarchical Clusterings," Journal of AI Research, Vol.4, pp.147-180, 1996.
  5. Z. Fu, J. Yang, W. Hu, T. Tan, "Mixture Clustering Using Multidimensional Histograms for Skin Detection," ICPR (4) 2004: 549-552.
  6. Gaber, M. M., Zaslavsky, A., and Krishnaswamy, S. Mining data streams: a review. SIGMOD Rec. 34, 2, 18-26, 2005.
  7. S. Guha, N. Mishra, R. Motwani, L. O'Callaghan, "Clustering Data Streams". IEEE FOCS Conference, 2000.
  8. Y. Lu, Y. Huang, "Mining Data Streams Using Clustering". Proc. 4th Int. Conf. on Machine Learning and Cybernetics, Giangzhou, pp.18-21, 2005.
  9. J. Lee, D. Kim, and C. Chung, "Multi-Dimensional Selectivity Estimation Using Compressed Histogram Information," Proc. ACM SIGMOD Conf., pp. 205-214, 1999.
  10. G. Medhat, M, M., Zaslavsky, A., and Krishnaswamy, S., "Towards an Adaptive Approach for Mining Data Streams in Resource Constrained Environments," Proc. of 6th Int. Conf. on Data Warehousing and Knowledge Discovery –- Industry Track (DaWak 2004), Zaragoza, Spain, September, Springer Verlag.
  11. L. O'Callaghan et al. Streaming-Data Algorithms for High-Quality Clustering. ICDE Conference, 2002.
  12. N. Park, W. Lee, Statistical Grid-Based Clustering over Data Streams, ACM SIGMOD Record, Vol.33, No.1, pp.32-37.
  13. V. Poosala, Y.E. Ioannidis, P.J. Haas, E.J. Shekita, "Improved Histograms for Selectivity Estimation of Range Predicates," ACM SIGMOD 1996.
  14. G. Strang, "The Discrete Cosine Transform". SLAM Review, Vol.41, No.1, pp.135-147, 1999.
  15. S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Conference, 1998.

Cited by

  1. Rapid blockwise multi-resolution clustering of facial images for intelligent watermarking vol.25, pp.2, 2014,
  2. Locating communities on graphs with variations in community sizes vol.65, pp.2, 2013,
  3. Approximating sliding windows by cyclic tree-like histograms for efficient range queries vol.69, pp.9, 2010,