Ranking Quality Evaluation of PageRank Variations

PageRank 변형 알고리즘들 간의 순위 품질 평가

  • Pham, Minh-Duc (Dept. of Computer Science, Korea Advanced Institute of Science and Technology) ;
  • Heo, Jun-Seok (Dept. of Computer Science, Korea Advanced Institute of Science and Technology) ;
  • Lee, Jeong-Hoon (Dept. of Computer Science, Korea Advanced Institute of Science and Technology) ;
  • Whang, Kyu-Young (Dept. of Computer Science, Korea Advanced Institute of Science and Technology)
  • 팜민득 (한국과학기술원 전산학과) ;
  • 허준석 (한국과학기술원 전산학과) ;
  • 이정훈 (한국과학기술원 전산학과) ;
  • 황규영 (한국과학기술원 전산학과)
  • Published : 2009.09.25

Abstract

The PageRank algorithm is an important component for ranking Web pages in Google and other search engines. While many improvements for the original PageRank algorithm have been proposed, it is unclear which variations (and their combinations) provide the "best" ranked results. In this paper, we evaluate the ranking quality of the well-known variations of the original PageRank algorithm and their combinations. In order to do this, we first classify the variations into link-based approaches, which exploit the link structure of the Web, and knowledge-based approaches, which exploit the semantics of the Web. We then propose algorithms that combine the ranking algorithms in these two approaches and implement both the variations and their combinations. For our evaluation, we perform extensive experiments using a real data set of one million Web pages. Through the experiments, we find the algorithms that provide the best ranked results from either the variations or their combinations.

PageRank 알고리즘은 구글(Google)등의 검색 엔진에서 웹 페이지의 순위(rank)를 정하는 중요한 요소이다. PageRank 알고리즘의 순위 품질(ranking quality)을 향상시키기 위해 많은 변형 알고리즘들이 제안되었지만 어떤 변형 알고리즘(혹은 변형 알고리즘들간의 조합)이 가장 좋은 순위 품질을 제공하는지가 명확하지 않다. 본 논문에서는 PageRank 알고리즘의 잘 알려진 변형 알고리즘들과 그들 간의 조합들에 대해 순위 품질을 평가한다. 이를 위해, 먼저 변형 알고리즘들을 웹의 링크(link) 구조를 이용하는 링크기반 방법(Link-based approaches)과 웹의 의미 정보를 이용하는 지식기반 방법(Knowledge-based approaches)으로 분류한다. 다음으로, 이 두 가지 방법에 속하는 알고리즘들을 조합한 알고리즘들을 제안하고, 변형 알고리즘들과 그들을 조합한 알고리즘들을 구현한다. 백만 개의 웹 페이지들로 구성된 실제 데이터에 대한 실험을 통해 PageRank의 변형 알고리즘들과 그들 간의 조합들로부터 가장 좋은 순위 품질을 제공하는 알고리즘을 찾는다.

Keywords

References

  1. Arasu, A. et al., 'Searching the Web,' ACM Trans. on Internet Technology (TOIT), Vol. 1, No. 1, pp. 2-43, Aug. 2001 https://doi.org/10.1145/383034.383035
  2. Advanced Information Technology Research Center (AITrc), http://aitrc.kaist.ac.kr
  3. Berkhin, P., 'A Survey on PageRank Computing,' Internet Mathematics, Vol. 2, No. 1, pp. 73-120, 2005 https://doi.org/10.1080/15427951.2005.10129098
  4. Bar-Ilan, J., Mat-Hassan, M., and Levene, M., 'Methods For Comparing Rankings of Search Engine Results,' Computer Networks, Vol. 50, No. 10, pp. 1448-1463, 2006 https://doi.org/10.1016/j.comnet.2005.10.020
  5. Can, F., Nuray, R., and Sevdik, A., 'Automatic Performance Evaluation of Web Search Engines,' Information Processing and Management, Vol. 40, No. 3, pp. 495-514, 2004 https://doi.org/10.1016/S0306-4573(03)00040-2
  6. Chowdhury, A. and Soboroff, I., 'Automatic Evaluation of World Wide Web Search Services,' In ACM SIGIR, 2002
  7. Devanshu, D., Wee, K., and Sourav, B., 'A Survey of Web Metrics,' ACM Computing Surveys, Vol. 34, No. 4, pp. 469-503, Dec. 2002 https://doi.org/10.1145/592642.592645
  8. Eiron, N., McCurley, K., and Tomlin, J., 'Ranking the Web Frontier,' In Proc. 13th Int'l Conf. on World Wide Web (WWW), pp. 309 - 318, May 2004
  9. Fagin, R., Kumar, R., and Sivakumar, D., 'Comparing Top k Lists,' SIAM J. DISCRETE MATH, Vol. 17, No. 1, pp. 134-160, 2003 https://doi.org/10.1137/S0895480102412856
  10. Gyongyi, Z., Berkhin, P., and Garcia-Molina, H., 'Web spam taxonomy,' In AIRWeb, 2005
  11. Gyongyi, Z., Garcia-Molina, H., and Jan, P., 'Combating Web Spam with TrustRank,' In VLDB, 2004
  12. Google Search, http://www.google.com
  13. Google Popular Queries Service, http://www.google.com/intl/en/press/intl-zeitgeist.html
  14. Haveliwala, T. H., 'Topic-sensitive PageRank,' In WWW, 2002
  15. Kamvar, S., Haveliwala, T., and Golub, G., 'Adaptive Methods for the Computation of Pagerank,' Linear Algebra and its Applications, Vol. 386, pp. 51-66, 2004 https://doi.org/10.1016/j.laa.2003.12.008
  16. Haveliwala, T. and Kamvar, S., The Second Eigenvalue of the Google Matrix, Technical Report, Dept. of Computer Science, Stanford Univ., 2003
  17. Kamvar, S. et al., Exploiting the Block Structure of theWeb for Computing Pagerank, Technical Report, Dept. of Computer Science, Stanford Univ., 2003
  18. Krishnan, V. and Raj, R., 'Web Spam Detection With Anti-TrustRank,' In AIRWeb, 2006
  19. MS Live Search, http://www.live.com
  20. Naver, http://www.naver.com
  21. Nie, L., Wu, B., and Davison, B., Incorporating Trust into Web Search, Technical Report, Lehigh University, Dec. 2006
  22. Page, L., et al., The PageRank Citation Ranking: Bringing Order to the Web, Technical Report SIDL-WP-1999-0120, Department of Computer Science, Stanford University, 1998
  23. Shin, E. et al., 'Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine,' Journal of The Korean Institute of Information Scientist and Engineers(KIISE): Computing Practice and Letters, Vol. 14, No. 6, pp. 567-581, Aug. 2008
  24. Sreangsu, A., and Joydeep, G., 'Outlink Estimation For PageRank Computation Under Missing Data,' In WWW, 2004
  25. Wang, Y. and Dewitt, D., 'Computing PageRank in a Distributed Internet Search System,' In VLDB, 2004
  26. Whang, K. et al., 'Odysseus: a High- Performance ORDBMS Tightly-Coupled with IR Features,' In ICDE, 2005
  27. Wikipedia, The free encyclopedia, http://www.wikipedia.org
  28. Yahoo! Seach, http://www.yahoo.com
  29. Yi, Z. et al., 'XRank: Learning More from Web User Behaviors,' In CIT, 2006
  30. Yoshida, Y. et al., 'What's Going on in Search Engine Rankings,' In AINAW, 2008