High-Dimensional Clustering Technique using Incremental Projection

점진적 프로젝션을 이용한 고차원 글러스터링 기법

  • 이혜명 (경문대학교 컴퓨터정보과) ;
  • 박영배 (명지대학교 컴퓨터공학과)
  • Published : 2001.12.01


Most of clustering algorithms data to degenerate rapidly on high dimensional spaces. Moreover, high dimensional data often contain a significant a significant of noise. which causes additional ineffectiveness of algorithms. Therefore it is necessary to develop algorithms adapted to the structure and characteristics of the high dimensional data. In this paper, we propose a clustering algorithms CLIP using the projection The CLIP is designed to overcome efficiency and/or effectiveness problems on high dimensional clustering and it is the is based on clustering on each one dimensional subspace but we use the incremental projection to recover high dimensional cluster and to reduce the computational cost significantly at time To evaluate the performance of CLIP we demonstrate is efficiency and effectiveness through a series of experiments on synthetic data sets.

대부분의 클러스터링 알고리즘들은 고차원 공간에서 성능이 급격히 저하되는 경향이 있다. 더욱이 고차원 데이타는 상당한 양의 잡음 데이타를 포함하고 있으므로 알고리즘의 추가적인 효과성 문제를 야기한다. 그러므로 고차원 데이타의 구조와 특성을 지원하는 적합한 클러스터링 기법이 개발되어야 한다. 본 논문에서는 선형변환 프로젝션을 이용한 클러스터링 알고리즘 CLIP을 제안한다. CLIP은 고차원 클러스터링의 효율성 및 효과성 문제를 극복하기 위해 개발되었으며, 클러스터 형성에 밀접하게 연관된 부분 공간에서 클러스터를 탐사하는 기법이다. 알고리즘의 주요 사상은 각1차원적 부분공간에서의 클러스터링에 기본을 두고 있지만. 점진적인 프로젝션을 이용하여 고차원 클러스터를 탐사한 뿐만 아니라 연산을 획기적으로 줄인다. CLIP의 성능을 평가하기 위해 합성 데이타를 이용한 일련의 실험을 통하여 효율성 및 효과성을 증명한다



  1. Mihael Ankerst, Markus M. Breunig, Han-Peter Kriegel, and Jorg Sander, 'OPTICS: Ordering points to identify the clustering structure,' Proc. of ACM SIGMOD Int. Conf. on Management of Data, 1999
  2. Charu C. Aggrawal, Cecilia Procopiuc, Joel I. Wolf, Philip S. Yu, and Jong Soo Prk, 'Fast Algorithms for Projected Clustering,' Proc. of ACM SIGMOD Int. Conf.. on Management of Data, pp. 61-72, 1999 https://doi.org/10.1145/304182.304188
  3. S. Berchtold, D. A. Keirn, C. Bohm, H. P. Kriegel, 'A Cost Model For Nearest Neighbor Search in High- Dimensional Data Space,' Proc. of the 16th Symposium on Principles of Database Systems (PODS), pp. 78-86, 1997 https://doi.org/10.1145/263661.263671
  4. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos and Prabhakar Raghavan, 'Automatic subspace Clustering on High Dimensional Data Mining Applications,' Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 94-105, 1998
  5. Raymond T. Ng, Jiawei Han, 'Efficient and Effective Clustering Methods for Spatial Data Mining,' Proc. of 20th Int. Conf. on VLDB, pp. 144-155, 1994
  6. Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu, 'A density-based algorithm for discovering clusters in large spatial database with noise, Proc. of Int. Conf. on Knowledge Discovery and Data Mining, 1996
  7. Hinneburg A., Keim D. A. 'An Efficient Approach to Clustering in Large Multimedia Databases with Noise,' Proc. of 4th Int. Conf. on Knowledge Discovery and Data Mining, 1998
  8. Tian Zhang, Haghu Hamakrishnan, and Miron Livny, 'BIRCH: An Efficient Data Clustering Method for Very Large Databases,' Proc. of ACM SIGOD Int. Conf. on Management of Data, pp. 103-114, 1996 https://doi.org/10.1145/233269.233324
  9. Wei Wang, Jiang Yang, and Richard Muntz, 'STING: A Statistical Information Grid Approach to Spatial Data Mining,' Proc. of 23rd Int. Conf. on VLDB. pp. 186-195, 1997
  10. Christos Faloutsos, 'Fast Searching by Content in Multimedia Database,' Data Engineering Bulletin. 18(4), 1995
  11. Fayyad, U. M. et al., Advances in Knowlwdge Discovery and Data Mining, AAAI Press / The MIT Press, pp. 307-328, 1996
  12. Kaushik Chakrabarti, Sharad Mehrotra, 'Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces,' Proc. of 26th Int. Conf. on VLDB, pp, 89-100, 2000
  13. Hinneburg A, 'Mining for High Dimensional Cluster using Projection and Visualizations,' Proc. Cit the EDBT 2000 phD Workshop, 2000
  14. Hinnoburg A., Keirn D. A., 'Opirnal Grid-Clustering: Towards breaking the Curse of Dimensionality in High-Dimensional Clustering,' Proc. of 25th Int. Conf. on VLDB. pp. 506-517. 1999
  15. 이혜명, 박영배, '고차원 데이타에서 점진적 프로젝션을 이용한 클러스터링', 한국정보과학회 가을학술발표 논문집(I), 2000