DOI QR코드

DOI QR Code

A Post Web Document Clustering Algorithm

후처리 웹 문서 클러스터링 알고리즘

  • Im, Yeong-Hui (Dept.of Computer Information Communication, Engineering, Daejeon University)
  • 임영희 (대전대학교 컴퓨터정보통신공학부)
  • Published : 2002.02.01

Abstract

The Post-clustering algorithms, which cluster the results of Web search engine, have several different requirements from conventional clustering algorithms. In this paper, we propose the new post-clustering algorithm satisfying those requirements as many as possible. The proposed Concept ART is the form of combining the concept vector that have several advantages in document clustering with Fuzzy ART known as real-time clustering algorithms. Moreover we show that it is applicable to general-purpose clustering as well as post-clustering

References

  1. O. Zamir and O. Etzioni, 'Web Document Clustering: A Feasibility Demonstration,' Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '98), pp.46-54, 1998 https://doi.org/10.1145/290941.290956
  2. A. Leouski and W. B. Croft, 'An Evaluation of Techniques for Clustering Search Results,' Technical Report IR-76, University of Massachusetts at Amherst, 1996
  3. D. S. Modha and W. S. Spangler, 'Clustering Hypertext With Applications To Web Searching,' Proceedings of ACM Hypertext Conference, 2000 https://doi.org/10.1145/336296.336351
  4. M. A. Hearst and J. O. Pedersen, 'Reexamining the Cluster Hypothesis : Scatter/Gather on Retrieval Results,' Proceedings of ACM SIGIR '96, pp.76-84, 1996 https://doi.org/10.1145/243199.243216
  5. O. Zamir and O. Etzioni, 'Grouper: A Dynamic Clustering Interface to Web Search Results,' available at http://www.cs.washington.edu.zamir/papers/www8.ps.gz
  6. 박민우, '검색엔진의 과거와 현재 그리고 미래', 마이크로소프트웨어, pp.220-235, 2000
  7. I. S. Dhillon and D. S. Modha, 'Concept Decomposition for Large Sparse Text Data using Clustering,' Technical Report RJ 10147(9502), IBM Almaden Research Center, 1999
  8. N. Vlajic and H. C. Card, 'Categorizing Web Pages using Modified ART,' IEEE Canadian Conference, Vol.1, pp.313-316, 1998 https://doi.org/10.1109/CCECE.1998.682747
  9. W. B. Frakes and R. Baeza-Yates, 'Information Retrieva I : Data Structures and Algorithms,' Prentice Hall, Englewood Cliffs, New Jersey, 1992
  10. J. J. Fan, 'MC: A Fast Sparse Matrix Generator For Large Text Collections,' available at http://www.cs.utexas.edu/users/jfan/dm/
  11. Available at http://www.cs.utexas.edu/users/inderjit/Resources/sparse_matrices
  12. G. A. Carpenter, S. Grossburg, and D. B. Rosen, 'Fuzzy ART : An Adaptive Resonance Algorithm for Rapid, Stable Classification of Analog Patterns,' Proceedings of 1991 International Conference Neural Networks, Vol. II, pp.411-416, 1991 https://doi.org/10.1109/IJCNN.1991.155368
  13. A. Baraldi and E. Alpaydin, 'Simplified ART : A New Class of ART Algorithms,' International Computer Science Institute. TR 98-004. 1998