노드정보를 이용한 문서검색의 성능에 관한 연구

A Study on the Performance of Structured Document Retrieval Using Node Information

  • 발행 : 2007.03.30


노드는 문서를 구성하는 작은 크기의 의미 있는 정보 단위이다. 정보검색에 문서의 구조정보를 이용함과 더불어 문서보다 작은 검색단위에 대한 연구가 활발히 이루어지고 있다. 이 연구에서는 노드정보를 이용한 검색실험을 위해 벡터공간모델 검색기법을 사용하여 다양한 유사도 산출방식을 적용한 실험과 구조정보를 활용한 확장 실험을 수행하였다. 실험결과 문서의 유사도를 산출하는 방식에 따른 검색성능의 차이는 거의 나타나지 않았으며, 구조정보를 적용하는 확장 노드검색이 가장 좋은 성능을 나타냈다.

Node is the semantic unit and a part of structured document. Information retrieval from structured documents offers an opportunity to go subdivided below the document level in search of relevant information, making any element in an structured document a retrievable unit. The node-based document retrieval constitutes several similarity calculating methods and the extended node retrieval method using structure information. Retrieval performance is hardly influenced by the methods for determining document similarity The extended node method outperformed the others as a whole.



  1. Abolhassani, M, and N, Fuhr. 2004 'Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents.' 26th European Conference on Information Retrieval Research (ECIR 2004). Springer
  2. Billingsley, P. 1979. Probability and Measure. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, Inc, New York
  3. Callan, James P. 1994. 'passage-Level Evidence in Document Retrieval.' Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
  4. Carmel, D., Y. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. 2003. 'searching XML documents via XML fragments.' In Proceedings of the 26th ACM SIGIR Conference, 151-158
  5. Chiaramella, Y., P. MUlhem, and F. Fourel. 1996. A Model for mu1timedia information retrieval. Technical report, FERMI ESPRIT BRA 8314, University of Glasgow
  6. Fuhr, N., N. Govert, N. Rolleke, T. 1998. 'DOLORES: A System for Logic-Based Retrieval of Multimedia Objects.' Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 257-265
  7. Hatano, K., H., M. Kinutani , M. Watanabe Yoshikawa, and S. Uemura. 2002. 'Determining the Unit of Retrieval Results for XML Documents.' In Proceedings of the First INitiative for the Evaluation of XML Retrieval (INEX)
  8. Huang, F., S. Watt, D. Harper, and M. Clark. 2006. 'Robert Gordon University at INEX 2006: Adhoc Track.' In Proceedings of the First INitiative for the Evaluation of XML Retrieval (INEX)
  9. INEX(Initiative for the Evaluation of XML retrieval).
  10. INEX(I nitiatiγe for the Eγaluation of XML retrieval). 2006.
  11. Kamps, J., M. de Rijke , and B. Sigurbjornsson. 2004. 'Length normalization in XML retrieval.' In Proceedings of the 27th Annual International ACM SIGIR Conference
  12. Kamps, J., M. Marx, M. de Rijke, and B. Sigurbjorolsson. 2003. 'XML Retrieval: What to Retrieve?' Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval
  13. Kaszkiel, M. and J. Zobel. 1997. 'Passage retrieval reγisited.' Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 178-185
  14. Kazai, G., M. Lalmas, and T. Rolleke. 2002. 'Focused Structured Document Retrieval.' In Proceedings of the 9th ,Symposium on String Processing and Information Retrieval(SPIRE 2002), Springer, 241-247
  15. Korfhage, R. 1997. Information storage and retrieval. NY : Wiley
  16. Luk, Robert W.P., H. V. Leong, Tharam S. Dillon, T. S. Alvin, W. Chan, Croft, Bruce, and Allan James 2002. 'A survey in indexing and searching XML documents.' Journal of the American Society for Information Science and Technology, 53(6): 415-437
  17. Mass, Y., M. E. Mandelbrod, D. Amitay, Y. Maarek Carmel, and A. Soffer. 2002. 'JuruXML - an XML retrieval system at INEX'02.' Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval
  18. Mass, Y. and M. Mandelbrod. 2003. 'Retrieving the most relevant XML Components.' Proceedings of the third Workshop of the INitiative for the Evaluation of XML Retrieval
  19. Salton, G, J. Allan, and C. Buckley. 1993. 'Approach to passage retrieval in full text information systems.'Proceedings of the 16th Annual International Conference on Research and Development in Information Retrieval
  20. Wilkinson, R. 1994. 'Effective retrieval of structured documents.' Proceedings of SIGIR Conference, 311-317
  21. Wolff, J.E., H. Florke, and A. B. Cremers.2000. 'Searching and browsing collections of structural information.' Proceedings of IEEE advances in digital libraries