DOI QR코드

DOI QR Code

A Study on the Effect of Data Fusion on the Retrieval Effectiveness of Web Documents

데이터 결합이 웹 문서 검색성능에 미치는 영향 연구

  • Park, Ok-Hwa (Search Portal Division, Search Service Team Member, Daum Communications) ;
  • Chung, Young-Mee (Dept. of Library and Information Science, Yonsei University)
  • 박옥화 (다음 커뮤니케이션 검색포털본부 통합검색팀) ;
  • 정영미 (연세대학교 문헌정보학과)
  • Published : 2007.03.31

Abstract

This study investigates the effect of data fusion on the retrieval effectiveness by performing an experiment combining multiple representations of Web documents. The types of document representation combined in the study include content terms, links, anchor text, and URL. The experimental results showed that the data fusion technique combining document representation methods in Web environment did not bring any significant improvement in retrieval effectiveness.

이 연구에서는 최근 검색성능을 향상시키기 위한 전략으로 사용되는 데이터 결합기법을 웹 문서 검색에 적용하고, 실험을 통해 문서표현 방법의 결합이 검색성능에 미치는 영향을 분석하였다. 문서 표현 방법으로는 내용기반 표현, 링크기반 표현,URL 등을 선정하고, 단일 표현 방법에 의한 검색결과와 표현방법의 결합을 통한 검색결과를 비교하였다. 분석결과 다른 문서표현 방법의 결합이 웹 문서의 검색성능을 향상시키지는 못하는 것으로 나타났다.

Keywords

References

  1. 안동언, 강인호. 2002. 웹 정보검색 시스템의 문서 순위 결정. 정보관리연구, 34(2): 55-66.
  2. 전상우. 2005. 색인 결합을 이용한 검색 성능향상에 관한 실험적 연구. 석사학위논문, 연세대학교 대학원, 문헌정보학과.
  3. 정영미. 2005. 정보검색연구. 서울: 구미무역(주) 출판부.
  4. 최성환. 2001. 용어 가중치 결합의 검색 효율성에 관한 연구. 석사학위논문, 연세대학교대학원, 문헌정보학과.
  5. Allan, J., Callan, J., Sanderson, M., Xu, J., and Wegmann, S. 1998. "INQUERY and TREC-7". Proceedings of the Seventh Text Retrieval Conference (TREC-7). NIST Special Publication 500-242. [cited 2006.5]. .
  6. Amitay, E. 1998. "Using common hypertext links to identify the best phrasal description of target Web documents". Proceedings of the SIGIR'98 Post-Conference Workshop on Hypertext Information Retrieval for the Web.
  7. Belkin, N. J., Coll, C., Croft, W. B., and Callan, J. P. 1993. "The effect of multiple query representations on information retrieval performance". Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 339-346.
  8. Belkin, N. J., Kantor, P., Fox, E. A., and Shaw, J. A.. 1995. "Combining the evidence of multiple query representations for information retrieval". Information Processing & Management, 31(3): 431-448. https://doi.org/10.1016/0306-4573(94)00057-A
  9. Bharat, K., and Henzinger, R. 1998. "Improved algorithms for topic distillation in a hyperlinked environment". Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 64-71.
  10. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., and Rajagopalan, S. 1998. "Automatic resource list compilation by analysing hyperlink structure and associated text". Proceedings of the 7th International World Wide Web conference.
  11. Croft, W. B. 2000. Combining approaches to information retrieval. In Croft, W.B., ed. Advances in Information Retrieval. Boston: Kluwer Academic Publishers.
  12. Davidson, B. D. 2000. "Topical locality in the Web". Proceedings of the 23th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 272- 279.
  13. Fox, E. A., and Shaw, J. A.. 1993. "Combination of multiple searches". Proceedings of the Second Text Retrieval Conference(TREC-2). NIST Special Publication 500-215. [cited 2006.5]. .
  14. Fox, E. A., and Shaw, J. A. 1994. "Combination of multiple searches". Proceedings of the Third Text Retrieval Conference(TREC-3). NIST Special Publication 500-225. [cited 2006.5]. .
  15. Hawking, D. Voorhees, E., Craswell, N., and Bailey. P. 1999. "Overview of the TREC-8 web track". Proceedings of the Eighth Text Retrieval Conference (TREC-8). NIST Special Publication 500-246. [cited 2006.5]. .
  16. Hawking, D. 2000. "Overview of the TREC-9 web track". Proceedings of the Ninth Text Retrieval Conference(TREC-9). NIST Special Publication 500-249. [cited 2006.5]. .
  17. Hawking, D., and Craswell, N., 2001. "Overview of the TREC-10 web track". Proceedings of the Tenth Text Retrieval Conference(TREC-10). NIST Special Publication 500-250. [cited 2006.5]. .
  18. Jansen, B. J. and Spink, A. 2005. "An analysis of Web searching by European alltheWeb.com users". Information Processing and Management, 41: 361-381. https://doi.org/10.1016/S0306-4573(03)00067-0
  19. Jansen, B. J., Spink, A., and Saracevic, T. 2000. "Real life, real users, and real needs: a study and analysis of user queries on the Web". Information Processing and Management, 36: 207-227. https://doi.org/10.1016/S0306-4573(99)00056-4
  20. Kang, I. H., and Kim, G. C., 2003. "Query type classification for web document retrieval". Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 64-71.
  21. Katzer, J., Tessier J., Frakes, W., and DasGupta, P. 1983. "A study of the overlap among document representations". Proceedings of the 6th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 106-114.
  22. Lee, J. H. 1995. "Combining multiple evidence from different properties of weighting schemes". Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 180-188.
  23. Lee, J. H. 1997. "Analyses of multiple evidence combination". Proceedings of the 20th ACM SIGIR Conference on Research and Development in Information Retrieval, 267-276.
  24. McGill, M. K. and Noreault, T. 1979. An investigation of factors affecting document ranking by information retrieval system. Technical report, School of Information Studies, Syracuse University.
  25. Missingham, R. 1996. "Indexing the Internet: pinning jelly to the wall?". Library Automated Systems Information Exchange, 27(3): 32-42.
  26. Ogilvie, P., and Callan, J. 2003. "Combining document representations for known-item search". Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 27-34.
  27. Rajashekar, T. B., and Croft, W. B. 1995. "Combining automatic and manual index representations in probabilistic retrieval". Journal of the American Society for Information Science, 46(4): 272-283. https://doi.org/10.1002/(SICI)1097-4571(199505)46:4<272::AID-ASI4>3.0.CO;2-T
  28. Salton, G., Allan, J., and Buckley, C. 1993. "Approaches to passage retrieval in full text information systems". Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 49-58.
  29. Thelwall, M. and Wilkinson, D. 2004. "Finding similar academic Web sites with links, Bibliometric Couplings and Colinks". Information Processing and Management, 40: 515-526. https://doi.org/10.1016/S0306-4573(03)00042-6
  30. Tsikrika, T. and Lalmas, M. 2002. "Combining Web document representations in a Bayesian inference network model using link & content-based evidence". Proceedings of the 24th European Colloquium on Information Retrieval Research(ECIR 2002), 53-72.
  31. Tsikrika, T. and Lalmas, M. 2004. "Combining evidence for Web retrieval using the inference network model: an experimental study". Information Processing and Management, 40(5): 751-772. https://doi.org/10.1016/j.ipm.2004.04.008
  32. Turtle, H. R. 1990. Inference Networks for Document Retrieval. Ph.D. diss., University of Massachusetts, Amherst.
  33. Turtle, H. R. and Croft, W. B. 1991. "Evaluation of an inference network-based retrieval model". ACM Transactions on Information System, 9(3): 187-222. https://doi.org/10.1145/125187.125188
  34. Vogt, C. C., and Cottrell. G. W. 1998. "Predicting the performance of linearly combined IR systems". Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 190-196.
  35. Westerveld, T. Kraaij, W., and Hiemstra, D. 2001. "Retrieving web pages using content, links, urls and anchor". Proceedings of the Tenth Text Retrieval Conference(TREC-10). NIST Special Publication 500-250. [cited 2006.5]. .
  36. Yang, K., 2001. "Combining text- and link-based retrieval methods for Web IR". Proceedings of the Tenth Text Retrieval Conference(TREC-10). NIST Special Publication 500-250. [cited 2006.5]. .