A Term Weight Mensuration based on Popularity for Search Query Expansion

검색 질의 확장을 위한 인기도 기반 단어 가중치 측정

  • 이정훈 (동국대학교 컴퓨터공학과) ;
  • 전서현 (동국대학교 컴퓨터공학과)
  • Received : 2010.02.17
  • Accepted : 2010.06.05
  • Published : 2010.08.15

Abstract

With the use of the Internet pervasive in everyday life, people are now able to retrieve a lot of information through the web. However, exponential growth in the quantity of information on the web has brought limits to online search engines in their search performance by showing piles and piles of unwanted information. With so much unwanted information, web users nowadays need more time and efforts than in the past to search for needed information. This paper suggests a method of using query expansion in order to quickly bring wanted information to web users. Popularity based Term Weight Mensuration better performance than the TF-IDF and Simple Popularity Term Weight Mensuration to experiments without changes of search subject. When a subject changed during search, Popularity based Term Weight Mensuration's performance change is smaller than others.

인터넷의 활용이 보편화 됨에 따라 사람들이 많은 정보를 웹을 통해 접할 수 있게 되었다. 정보의 양이 급격히 늘어나면서 검색 엔진은 사용자가 필요로 하지 않는 정보까지 보여주는 검색 성능의 한계를 가져왔다. 따라서 사용자는 원하는 정보를 검색하기 위해 과거보다 더 많은 시간과 노력이 필요하게 되었다. 이 연구에서는 질의 확장을 이용하여 사용자가 필요로 하는 정확한 정보를 신속하게 찾아서 제공할 수 있는 방법을 제안한다. 제안된 단어 가중치 평가방법은 검색 주제의 변동 없이 하나의 검색 주제를 검색할 경우 TF-IDF 또는 단순 인기도 측정법 보다 우수한 성능을 보인다. 또한 검색 중 주제를 변경하였을 때에도 검색 주제 변경 전과 유사한 성능으로 기존의 측정법 보다 빠르게 새로운 주제와 관련된 단어를 추출하고 정확한 가중치를 측정한다.

Keywords

References

  1. Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, "The PageRank Citation Ranking: Bringing Order to the Web," Technical report, Stanford University, 1998.
  2. Buckley C., Salton G., and Allan J., "The Effect of Adding Relevance Information in a Relevance Feedback Environment," Proceedings of 17th annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Dublin, pp.292-300, 1994.
  3. Anick, P. G. and Vaithyanathan, S., "Exploiting Clustering and Phrases for Context-Based Information Retrieval," Proceeding of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp.314-323, 1997.
  4. Tribula, W. J., "Text Mining," Annual Review of Information Science and Technology, pp.385-419, 1999.
  5. Kristensen, J., "Expanding End-Users," Query Statements for Free-text Searching with a Search-aid Thesaurus," Information Processing and Management, vol.11, pp.22-33, 1968.
  6. Salton, G., and Buckley, C., "Improving Retrieval Performance by Relevance Feedback," Journal of the American Society for Information Science, vol.41, pp.288-297, 1990. https://doi.org/10.1002/(SICI)1097-4571(199006)41:4<288::AID-ASI8>3.0.CO;2-H
  7. Harman, D., "Relevance Feedback Revisited," Proceedings of 15th annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, pp.1-10, 1992.
  8. Li Ding, Tim Finin, and Anupam Joshi, "Swoogle: A search and metadata engine for the semantic web.," In Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, pp.58-61, 2004.
  9. B. Yang and G. Jeh, "Retroactive answering of search queries," Proceedings of the 15th international conference on World Wide Web, pp.457-466, 2006.
  10. Qiu, F., and Cho, J., "Automatic identification of user interest for personalized search," In Proceedings of the 15th International Conference on World Wide Web., pp.727-736, 2006.
  11. K. Sugiyama, K. Hatano, and M. Yoshikawa, "Adaptive web search based on user profile constructed without any effort from users," In Proceedings of the 15th International Conference on World Wide Web., pp.675-684, 2004.
  12. Reiner Kraft, Chi Chao Chang, Farzin Maghoul, and Ravi Kumar, "Searching with context," In 15th International CIKM Conference Proceedings, pp.477-486, 2006.
  13. S. S. Kang, "A Rule-Based Method for Morphological Disambiguation," Proceedings of the NLPRS (Natural Language Processing Pacific Rim Symposium), pp.67-72, 1999.
  14. D. Mount, "ANN: Library for Approximate Nearest Neighbor Searching," http://www.cs.umd.edu/-mount/ANN/, 2006.
  15. Agrawal, R., and Srikant, R., "Fast Algorithms for Mining Association Rules," Proceeding of the 20th International Conference on Very Large Databases, pp.487-499, 1994.
  16. J. Cho, S. Roy, and R. Adams, Page quality: In search of an unbiased web rankIng. Proceedings of the 2005 ACM SIGMOD international conference on Management of data 2005, Baltimore, Maryland, June, pp.14-16, 2005.
  17. Zhicheng Dou, Ruihua Song, Ji-Rong Wen, "A Largescale Evaluation and Analysis of Personalized Search Strategies," Proceedings of the 16th international conference on World Wide WebNew York, NY, USA: ACM, pp.581-590. 2007.
  18. Kalervo Jarvelin, Jaana Kekalainen, "Cumulated gain-based evaluation of IR techniques," ACM Transactions on Information Systems, 20(4), pp.422-446 (2002). https://doi.org/10.1145/582415.582418