A Post Web Document Clustering Algorithm

후처리 웹 문서 클러스터링 알고리즘

  • Im, Yeong-Hui (Dept.of Computer Information Communication, Engineering, Daejeon University)
  • 임영희 (대전대학교 컴퓨터정보통신공학부)
  • Published : 2002.02.01


The Post-clustering algorithms, which cluster the results of Web search engine, have several different requirements from conventional clustering algorithms. In this paper, we propose the new post-clustering algorithm satisfying those requirements as many as possible. The proposed Concept ART is the form of combining the concept vector that have several advantages in document clustering with Fuzzy ART known as real-time clustering algorithms. Moreover we show that it is applicable to general-purpose clustering as well as post-clustering


  1. O. Zamir and O. Etzioni, 'Web Document Clustering: A Feasibility Demonstration,' Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '98), pp.46-54, 1998
  2. A. Leouski and W. B. Croft, 'An Evaluation of Techniques for Clustering Search Results,' Technical Report IR-76, University of Massachusetts at Amherst, 1996
  3. D. S. Modha and W. S. Spangler, 'Clustering Hypertext With Applications To Web Searching,' Proceedings of ACM Hypertext Conference, 2000
  4. M. A. Hearst and J. O. Pedersen, 'Reexamining the Cluster Hypothesis : Scatter/Gather on Retrieval Results,' Proceedings of ACM SIGIR '96, pp.76-84, 1996
  5. O. Zamir and O. Etzioni, 'Grouper: A Dynamic Clustering Interface to Web Search Results,' available at
  6. 박민우, '검색엔진의 과거와 현재 그리고 미래', 마이크로소프트웨어, pp.220-235, 2000
  7. I. S. Dhillon and D. S. Modha, 'Concept Decomposition for Large Sparse Text Data using Clustering,' Technical Report RJ 10147(9502), IBM Almaden Research Center, 1999
  8. N. Vlajic and H. C. Card, 'Categorizing Web Pages using Modified ART,' IEEE Canadian Conference, Vol.1, pp.313-316, 1998
  9. W. B. Frakes and R. Baeza-Yates, 'Information Retrieva I : Data Structures and Algorithms,' Prentice Hall, Englewood Cliffs, New Jersey, 1992
  10. J. J. Fan, 'MC: A Fast Sparse Matrix Generator For Large Text Collections,' available at
  11. Available at
  12. G. A. Carpenter, S. Grossburg, and D. B. Rosen, 'Fuzzy ART : An Adaptive Resonance Algorithm for Rapid, Stable Classification of Analog Patterns,' Proceedings of 1991 International Conference Neural Networks, Vol. II, pp.411-416, 1991
  13. A. Baraldi and E. Alpaydin, 'Simplified ART : A New Class of ART Algorithms,' International Computer Science Institute. TR 98-004. 1998