Link-Based Clustering in Blogosphere

블로그 공간에서의 링크 기반 클러스터링 방안

  • Song, Suk-Soon (Department of Electronics and Computer Engineering, Hanyang University) ;
  • Yoon, Seok-Ho (Department of Electronics and Computer Engineering, Hanyang University) ;
  • Kim, Sang-Wook (Department of Electronics and Computer Engineering, Hanyang University)
  • 송석순 (한양대학교 전자컴퓨터통신공학과) ;
  • 윤석호 (한양대학교 전자컴퓨터통신공학과) ;
  • 김상욱 (한양대학교 전자컴퓨터통신공학과)
  • Published : 2009.05.25

Abstract

This paper addresses clustering of blogs and posts in blogosphere. First, we model blogosphere as a social network where blogs and posts correspond to nodes and interactions on posts by blogs corresponds to links. Next, for clustering in blogosphere, we employ LinkClus, a link based algorithm that finds clusters of nodes in a network effectively and efficiently. For more accurate clustering, we propose two refinements: (1) change of granularity from blogs to folders, and (2) removal of blogs and posts being highly likely to incur noises. Finally, we verify the effectiveness of the proposed approach by showing how the posts and blogs in the same cluster are similar to one another in terms of their contents.

본 논문에서는 블로그 공간에 존재하는 블로거와 포스트들을 클러스터링하고자 한다. 먼저 블로그 공간의 블로거와 포스트들을 각각 하나의 타입으로, 블로거와 포스트 사이의 액션을 링크로 사상한다. 다음으로, 블로그 공간의 클러스터링을 위하여 블로그 환경에 가장 적합하고 효율적인 링크 기반 클러스터링 방법인 LinkClus를 선택한다. 정확한 클러스터링을 위하여 두 가지 방법을 제시한다. 첫 번째는 클러스터의 대상을 여러 주제에 관심을 가지는 블로거 대신 하나의 주제만을 나타내는 폴더로 한다. 두 번째는 노이즈의 발생 가능성을 높이는 링크가 아주 적은 블로거와 포스트를 클러스터링 과정에서 제외시킨다. 실험을 통하여 제안하는 방안을 이용한 클러스터링 결과가 내용적으로도 유사한지 검증한다.

Keywords

References

  1. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006
  2. S. Gardner, Buzz Marketing With Blogs for Dummies, John Wiley & Sons Inc, 2005
  3. X. Yin, J. Han, and P. Yu, 'LinkClus: Efficient Clustering via Heterogeneous Semantic Links,' In Proc. Int'l. Conf. on Very Large Data Bases, pp. 427-438, 2006
  4. H. Small, 'Co-citation in the Scientific Literature: A new Measure of the Relationship between Two Documents,' Journal of the American Society for Information Science, Vol. 24, No. 4, pp. 265-269, 1973 https://doi.org/10.1002/asi.4630240406
  5. M. Kessler, 'Bibliographic Coupling Between Scientific Papers,' Journal of the American Documentation, Vol. 14, No. 1, pp. 10-25, 1963 https://doi.org/10.1002/asi.5090140103
  6. G. Jeh and J. Widom, 'SimRank: A Measure of Structural-Context Similarity,' In Proc. Int'l. Conf. on Special Interest Group on Knowledge Discovery and Data, pp. 538-543, 2002 https://doi.org/10.1145/775047.775126
  7. J. Wang et al., 'ReCoM: Reinforcement Clustering of Multi-type Interrelated Data Objects,' In Proc. Int'l. Conf. on Special Interest Group on Information Retrieval, pp. 274-281, 2003 https://doi.org/10.1145/860435.860486
  8. NHN(주), http://blog.naver.com, 2009.
  9. Wikipedia, blog, http://en.wikipedia.org/wiki/Blog, 2009
  10. S. Herring et al., Conversations in the Blogosphere: An Analysis 'From the Bottom Up,' In Proc. of the 38th Annual Hawaii Int'l. Conf. on System Scicences, pp. 107b, 2005
  11. Y. Lin, 'Blog Community Discovery and Evolution based on Mutual Awareness Expansion,' In Proc. Int'l. Conf. on Web Intelligence, pp. 48-56, 2007 https://doi.org/10.1109/WI.2007.30
  12. K. Fujimura, T. Inoue, and M. Sugisaki, 'The Eigenrumor Algorithm for Ranking Blogs,' In Proc. Int'l. Conf. on World Wide Web, 2005
  13. D. Gruhl et al., 'Information Diffusion Through Blogspace' In Proc. Int'l. Conf. on World Wide Web, pp. 491-501, 2004 https://doi.org/10.1145/988672.988739
  14. A. Chin and M. Chignell, 'A Social Hypertext Model for Finding Community in Blogs', In Proc. Int'l. Conf. on Hypertext and Hypermedia, pp. 11-22, 2006 https://doi.org/10.1145/1149941.1149945
  15. J. Wang and J. Han, 'CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,' In Proc. Int'l. Conf. on Special Interest Group on Knowledge Discovery and Data, pp. 236-245, 2003 https://doi.org/10.1145/956750.956779
  16. N. Pasquier et al., 'Discovering Frequent Closed Itemsets for Association Rules,' In Proc. Int'l. Conf. on Database Theory, pp. 398-416, 1999 https://doi.org/10.1007/3-540-49257-7_25
  17. R. Kumar et al., 'Trawling the Web for Emerging Cyber-Communities,' In Proc. Int'l. Conf. on World Wide Web, pp. 1481-1493, 1999 https://doi.org/10.1016/S1389-1286(99)00040-7