DOI QR코드

DOI QR Code

Improving Performance of Search Engine Using Category based Evaluation

범주 기반 평가를 이용한 검색시스템의 성능 향상

  • 김형일 (나사렛대학교 멀티미디어학과) ;
  • 윤현님 (한국폴리텍대학 안성여자캠퍼스 디지털정보과)
  • Received : 2012.11.27
  • Accepted : 2012.12.20
  • Published : 2013.01.28

Abstract

In the current Internet environment where there is high space complexity of information, search engines aim to provide accurate information that users want. But content-based method adopted by most of search engines cannot be used as an effective tool in the current Internet environment. As content-based method gives different weights to each web page using morphological characteristics of vocabulary, the method has its drawbacks of not being effective in distinguishing each web page. To resolve this problem and provide useful information to the users, this paper proposes an evaluation method based on categories. Category-based evaluation method is to extend query to semantic relations and measure the similarity to web pages. In applying weighting to web pages, category-based evaluation method utilizes user response to web page retrieval and categories of query and thus better distinguish web pages. The method proposed in this paper has the advantage of being able to effectively provide the information users want through search engines and the utility of category-based evaluation technique has been confirmed through various experiments.

정보에 대한 공간 복잡도가 높은 현재의 인터넷 환경에서는 사용자가 원하는 정보를 정확히 제공하는 것이 검색엔진의 목표이다. 그러나 대다수 검색엔진이 활용하는 내용 기반 기법은 현재의 인터넷 환경에서는 효과적인 도구로 사용될 수 없다. 내용 기반 기법은 어휘의 형태적 특성을 이용하여 웹페이지 가중치를 결정하기 때문에 웹페이지에 대한 변별력이 우수하지 못하다는 단점이 있다. 이러한 문제점을 해결하여 사용자에게 효과적인 정보를 제공하기 위해, 본 논문에서는 범주 기반 평가 기법을 제안한다. 범주 기반 평가 기법은 질의어를 의미관계로 확장하여 웹페이지와 유사성을 측정한다. 웹페이지 가중치 적용에 있어서, 범주 기반 평가 기법은 웹페이지 검색에 대한 사용자 반응과 질의어 범주를 가중치에 활용함으로써 웹페이지에 대한 변별력을 증가시킨다. 본 논문에서 제안한 기법은 사용자가 원하는 정보를 검색엔진을 통해 효과적으로 제공할 수 있는 장점이 있으며, 다양한 실험을 통해 범주 기반 평가 기법의 활용성을 확인하였다.

Keywords

References

  1. R. Kaul, Y. Yun, and S. Kim, "Ranking billions of web pages using diodes," Communications of the ACM, Vol.52, No.8, pp.132-136, 2009.
  2. F. Liu, C. Yu, and W. Meng, "Personalized Web Search for Improving Retrieval Effectiveness," IEEE Transactions on Knowledge and Data Engineering, Vol.16, No.1, pp.28-40, 2004. https://doi.org/10.1109/TKDE.2004.1264820
  3. J. Zobel and A. Moffat, "Inverted files for text search engines," ACM Computing Surveys, Vol.38, No.2, 2006.
  4. K. W. Leung, W. NG, and D. L. Lee, "Personalized Concept-Based Clustering of Search Engine Queries," IEEE Transactions on Knowledge and Data Engineering, Vol.20, No.11, pp.1505-1518, 2008. https://doi.org/10.1109/TKDE.2008.84
  5. A. N. Langville and C. D. Meyer, Google's PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006.
  6. J. M. Kleinberg, "Authoritative sources in a hyperlinked environment,"The Journal of the ACM, Vol.46, Issue.5, pp.604-632, 1999. https://doi.org/10.1145/324133.324140
  7. T. Chen, W. Han, H. Wang, Y. Zhou, B. Xu, and B. Zang, "Content Recommendation System Based on Private Dynamic User Profile," International Conference on Machine Learning and Cybernetics, pp.2112-2118, 2007.
  8. M. N. Uddin, J. Shrestha, and G. Jo, "Enhanced Content-Based Filtering Using Diverse Collaborative Prediction for Movie Recommendation," 2009 First Asian Conference on Intelligent Information and Database Systems, pp.132-137, 2009.
  9. D. Billsus and M. Pazzani, "Learning Collaborative Information Filters," Proceedings of the 15th International Conference on Machine Learning, 1998.
  10. B. Krulwich, "Lifestyle Finder: Intelligent user profiling using large-scale demographic data," Artificial Intelligence Magazine, Vol.18, No.2, 1997.
  11. L. Deng, W. Ng, X. Chai, and D.L. Lee, "Spying Out Accurate User Preferences for Search Engine Adaptation," Advances in Web Mining and Web Usage Analysis, LNCS3932, pp.87-103, 2006.
  12. C. Jian, Y. Jian, and H. Jin, "Automatic content-based recommendation in e-commerce," The 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service, pp.748-753, 2005.
  13. C. D. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.
  14. J. Han and M. Kamber, Data Mining Concepts and Techniques, Morgan Kaufmann, 2001.
  15. J. Z. Huang, M. K. Ng, H. Rong, and Z. Li, "Automated variable weighting in k-means type clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.27, No.5, pp.657-668, 2005. https://doi.org/10.1109/TPAMI.2005.95
  16. S. H. Al-Harbi, "Adapting k-means for supervised clustering," Applied Intelligence, Vol.24, No.3, pp.219-226, 2006. https://doi.org/10.1007/s10489-006-8513-8
  17. P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.
  18. B. J. Jansen and A. Spink, "How are we searching the World Wide Web? A comparison of nine search engine transaction logs," Information Processing & Management, Vol.42, No.1, pp.248-263, 2006. https://doi.org/10.1016/j.ipm.2004.10.007
  19. G.A. Miller and F. Hristea, "WordNet Nouns: Classes and Instances," Computational Linguistics, Vol.32, No.1, pp.1-3, 2006. https://doi.org/10.1162/coli.2006.32.1.1
  20. S. K. Ray and S. Singh, "Blog content based recommendation framework using WordNet and multiple Ontologies," 2010 International Conference on Computer Information Systems and Industrial Management Applications, pp.432-437, 2010.