DOI QR코드

DOI QR Code

The Topic-Rank Technique for Enhancing the Performance of Blog Retrieval

블로그 검색 성능 향상을 위한 주제-랭크 기법

  • Shin, Hyeon-Il (Dept. of Computer Science, Chungbuk National University) ;
  • Yun, Un-Il (Dept. of Computer Science, Chungbuk National University) ;
  • Ryu, Keun-Ho (Dept. of Computer Science, Chungbuk National University)
  • 신현일 (충북대학교 전자정보대학 컴퓨터) ;
  • 윤은일 (충북대학교 전자정보대학 컴퓨터) ;
  • 류근호 (충북대학교 전자정보대학 컴퓨터)
  • Received : 2010.07.15
  • Accepted : 2010.09.28
  • Published : 2011.01.31

Abstract

As people have heightened attention to blogs that are individual media, a variety rank algorithms was proposed for the blog search. These algorithms was modified for structural features of blogs that differ from typical web sites, and measured blogs' reputations or popularities based on the interaction results like links, comments or trackbacks and reflected in the search system. But actual blog search systems use not only blog-ranks but also search words, a time factor and so on. Nevertheless, those might not produce desirable results. In this paper, we suggest a topic-rank technique, which can find blogs that have significant degrees of association with topics. This technique is a method which ranks the relations between blogs and indexed words of blog posts as well as the topics representing blog posts. The blog rankings of correlations with search words are can be effectively computed in the blog retrieval by the proposed technique. After comparing precisions and coverage ratios of our blog retrieval system which applis our proposed topic-rank technique, we know that the performance of the blog retrieval system using topic-rank technique is more effective than others.

1인 미디어인 블로그에 대한 관심이 증가함에 따라, 블로그 검색과 관련된 다양한 랭킹 알고리즘들이 제안되었다. 이러한 알고리즘들은 블로그가 웹 페이지와 다르게 갖는 구조적 특징에 맞게 변형되었으며, 각 블로그간의 연결이나, 댓글, 트랙백들을 통해 이루어진 상호소통 속에서 나타난 결과들을 바탕으로 블로그의 평판이나 인기도를 수치화하여 검색 시스템에 반영한다. 하지만 실제 블로그 검색에서는 블로그 자체의 랭크뿐만 아니라 검색어와 블로그 글과의 적합성과시간등의요소를복합적으로사용하게된다. 그런데기존에알려진요소만으로는검색결과의품질이낮을수 있다. 본 논문에서는 블로그의 주제와 관련도가 가장 높은 블로그를 찾아 낼 수 있는 주제-랭크 기법을 제안한다. 이 기법은 블로그와 블로그 글의 색인어뿐만 아니라, 블로그 글을 대표하는 주제와의 관계까지 랭킹을 매기는 방법이다. 제안된 기법을 통해 블로그 검색에서 검색어와 블로그의 연관성에 따라 랭킹을 효과적으로 부여할 수 있다. 본 논문 제안하는 주제-랭크 기법을 적용한 블로그 검색 시스템의 정확률과 적용률을 국내의 다른 블로그 검색 시스템들과 비교해 본 결과, 주제-랭크 기법을 사용한 블로그 검색 시스템의 성능이 타 시스템에 비해 더 우수함을 알 수 있었다.

Keywords

References

  1. Kumar, R., Novak, P., Raghavan, S. and Tomkins, A, "Structure and evolution of the Blogspace," Communication of the ACM, Vol. 47, No. 12, 2004.
  2. Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C., "Topic sentiment mixture: modeling facets and opinions in weblogs," In Proceedings of the 16th international Conference on World Wide Web, WWW '07. ACM, New York, pp. 171-180, 2007.
  3. Chris Anderson. "The long tail : why the future of business is selling less of more," New York : Hyperion, 2006.
  4. Won-Seok H., Young-Joo D., Duck-Ho B. and Sang-Wook K., "Post Ranking Algorithms in Blog Environment," Proc. of the KIISE Korea Computer Congress 2008, Vol. 35, No. 1(C), pp. 189-193, 2008 June.
  5. Apostolos Kritikopoulos, Martha Sideri, and Iraklis Varlamis. "Blogrank: ranking weblogs based on connectivity and similarity features," AAA-IDEA '06, pp. 8, 2006.
  6. Jung-Hoon Kim, Tae-Bok Yoon, Kun-Su Kim, Jee-Hyong Lee, "Trackback-Rank: An Effective Ranking Algorithm for the Blog Search," IITA, vol. 3, pp. 503-507, 2008.
  7. Kangmiao Liu, Guang Qiu, Jiajun Bu, Chun Chen, "Ranking Using Multi-features in Blog Search," Advances in Multimedia Information Processing - PCM 2007, pp. 714-723, 2007.
  8. Junghoon K. Taebok Y. and Jeehyong L, "The Blog-Rank algorithm for the effective blog search," Proc. of the 35th KIISE Fall Conference, Vol. 35, No. 2(A), pp. 93-94, 2008 October.
  9. Y. Wu and B.L. Tseng, "Important Weblog Identification and Hot Story Summarization," In Proceedings of AAAI Computational Approaches to Analyzing Weblogs, pp. 221-227, 2006.
  10. Won-Seok Hwang, Sang-Wook Kim, Duck-Ho Bae, Young-Joo Do, "Post Ranking Algorithms in Blog Environment," Future Generation Communication and Networking Symposia, International Conference on, vol.2, pp. 64-67, 2008.
  11. Jie Shen, Yan Zhu, Hui Zhang, Chen Chen, Rongshuang Sun, Fayan Xu, "A Content-Based Alg- orithm for Blog Ranking," International Conference on Internet Computing in Science and Engineering, pp. 19-22, 2008.
  12. Ko Fujimura, Takafumi Inoue, Masayuki Sugisaki, "The EigenRumor Algorithm for Ranking Blogs," WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
  13. Nitin Agarwal, Huan Liu, Lei Tang, and Philip S Yu, "Identifying the Influential Bloggers in a Community," In WSDM '08: Proceedings of the international conference on Web search and web data mining, pp. 207-218, 2008.
  14. Herring, S. C., Kouper I., Paolillo J. C., Scheidt, L. A., Tyworth M., Welsch P., Wright E. & Yu N., "Conversations in the Blogosphere: An Analysis "From the Bottom Up," PHICSS-38, 2005.
  15. Junghoon K. Taebok Y. and Jeehyong L, "The Effective Blog Search Algorithm based on the Structural Features in the Blogspace," Journal of KIISE : Software and Applications, Vol. 36, No. 7, pp. 580-589, 2009 July.
  16. Hyeonil S., Unil Y. and Keun H. R., "Efficient Blog Retrieval System by Topic-based Weighting," Journal of the Korea Society of Computer and Information, Vol. 15, No. 4, pp. 1-9, 2010 April. https://doi.org/10.9708/jksci.2010.15.4.001
  17. Dou, Z., Song, R., Nie, J., and Wen, J., "Using Anchor Texts with Their Hyperlink Structure for Web Search," In Proceedings of the 32nd international ACM SIGIR '09., New York, pp. 227-234, 2009.
  18. Jae-Yun L., "A Study on the Pivoted Inverse Document Frequency Weighting Method," Journal of the Korea Society for Information Management, Vol. 20, No. 4, pp. 233-248, 2003 December. https://doi.org/10.3743/KOSIM.2003.20.4.233
  19. Seung-Shik K., Hagyu L., So-Hyun S., Gi-Choi H. and Byung-Joo M., "Term Weighting Method by Postposition and Compound Noun Recognition," Proc. of the 28th KIISE Fall Conference, Vol. 28, No. 2, pp. 196-198, 2001 October.
  20. Ben Adida, "hGRDDL: Bridging microformats and RDFa," Web Semantics : Science, Services and Agents on the World Wide Web, Vol. 6, No. 1, pp. 54-60, 2008. https://doi.org/10.1016/j.websem.2007.11.006
  21. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. "The Pagerank Citation Ranking: Bringing Order to the Web," Technical report, Stanford Digital Library Technologies Project, 1998.
  22. A. Borodin and R. Gareth, S. Jeffrey and T. Panatiotis, "Link Analysis Ranking - Algorithms, Theory, and Experiments," ACM Trans. on Internet Technology, Vol. 5, No. 1, pp. 231-297, 2005. https://doi.org/10.1145/1052934.1052942
  23. Bok-Keun S. and Da-Hyun W. and Kwang-Rok H., "A Study on Paper Retrieval System based on OWL Ontology," Journal of The Korea Society of Computer and Information, Vol. 14, No. 2, pp. 169-180, 2009 February.
  24. Unil Y. Hyeonil S. and Keun H. R., "Intelligent Retrieval System for Finding Important Travel Information," Journal of The Korea Society of Computer and Information, Vol. 14, No. 11, pp. 113-121, 2009 November.
  25. Dunam K., Kangpyo L. and Hyoung-Joo K., "Improved Tag Selection for Tag-cloud using the Dynamic Characteristics of Tag Co-occurrence," Journal of KIISE : Computing Practices and Letters, Vo. 15, No. 6, pp. 405-413, 2009 June.