• Title/Summary/Keyword: Query expansion

Search Result 131, Processing Time 0.021 seconds

Vocabulary Expansion Technique for Advertisement Classification

  • Jung, Jin-Yong;Lee, Jung-Hyun;Ha, Jong-Woo;Lee, Sang-Keun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.5
    • /
    • pp.1373-1387
    • /
    • 2012
  • Contextual advertising is an important revenue source for major service providers on the Web. Ads classification is one of main tasks in contextual advertising, and it is used to retrieve semantically relevant ads with respect to the content of web pages. However, it is difficult for traditional text classification methods to achieve satisfactory performance in ads classification due to scarce term features in ads. In this paper, we propose a novel ads classification method that handles the lack of term features for classifying ads with short text. The proposed method utilizes a vocabulary expansion technique using semantic associations among terms learned from large-scale search query logs. The evaluation results show that our methodology achieves 4.0% ~ 9.7% improvements in terms of the hierarchical f-measure over the baseline classifiers without vocabulary expansion.

Performance Evaluation of Re-ranking and Query Expansion for Citation Metrics: Based on Citation Index Databases (인용 지표를 이용한 재순위화 및 질의 확장의 성능 평가 - 인용색인 데이터베이스를 기반으로 -)

  • HyeKyung Lee;Yong-Gu lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.3
    • /
    • pp.249-277
    • /
    • 2023
  • The purpose of this study is to explore the potential contribution of citation metrics to improving the search performance of citation index databases. To this end, the study generated ten queries in the field of library and information science and conducted experiments based on the relevance assessment using 3,467 documents retrieved from the Web of Science and 60,734 documents published in 85 SSCI journals in the field of library and information science from 2000 to 2021. The experiments included re-ranking of the top 100 search results using citation metrics and search methods, query expansion experiments using vector space model retrieval systems, and the construction of a citation-based re-ranking system. The results are as follows: 1) Re-ranking using citation metrics differed from Web of Science's performance, acting as independent metrics. 2) Combining query term frequencies and citation counts positively affected performance. 3) Query expansion generally improved performance compared to the vector space model baseline. 4) User-based query expansion outperformed system-based. 5) Combining citation counts with suitability documents affected ranking within top suitability documents.

Searching Thesaurus Construction with Word Association Test: A Pilot Study (단어연상검사법을 이용한 탐색 시소러스 구축에 관한 실험적 연구)

  • Han Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.40 no.3
    • /
    • pp.289-304
    • /
    • 2006
  • The purpose of this pilot study is to construct a searching thesaurus with word association test in the library and information science field and to confirm it's functionality as searching aids through query expansion experiments. The test results were analyzed to four types of relationship between stimulus words and response words, and the terms of association thesaurus were compared with descriptors of an existing thesaurus. The test results show that the word association test is a fruitful method to identify many related terms and narrower and equivalent terms in some degree to the stimulus terms. Furthermore. in the query expansion experiment. the Performance of association thesaurus was better than that of an existing thesaurus, This result demonstrates that word association thesaurus can apply to query expansion.

A Study on Keyword Extraction and Expansion for Web Text Retrieval (웹 문서 검색을 위한 검색어 추출과 확장에 관한 연구)

  • Yoon, Sung-Hee
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.9
    • /
    • pp.1111-1118
    • /
    • 2004
  • Natural language query is the best user interface for the users of web text retrieval systems. This paper proposes a retrieval system with expanded keyword from syntactically-analyzed structures of user's natural language query based on natural language processing technique. Through the steps combining or splitting the compound nouns based on syntactic tree traversal, and expanding the other-formed or shorten-formed keyword into multiple keyword, it shows that precision and correctness of the retrieval system was enhanced.

  • PDF

Syntactic Analysis and Keyword Expansion for Performance Enhancement of Information Retrieval System (정보 검색 시스템의 성능 향상을 위한 구문 분석과 검색어 확장)

  • 윤성희
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.5 no.4
    • /
    • pp.303-308
    • /
    • 2004
  • Natural language query is the best user interface for the users of information retrieval systems. This paper Proposes a retrieval system with expanded keyword from syntactically-analyzed structures of user's natural language query based on natural language processing technique. Through the steps combining or splitting the compound nouns based on syntactic tree traversal, and expanding the other-formed or shorten-formed keyword into multiple keyword, the system performance was enhanced up to 11.3% precision and 4.7% correctness.

  • PDF

Semantic Query Expansion based on Concept Coverage of a Deep Question Category in QA systems (질의 응답 시스템에서 심층적 질의 카테고리의 개념 커버리지에 기반한 의미적 질의 확장)

  • Kim Hae-Jung;Kang Bo-Yeong;Lee Sang-Jo
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.297-303
    • /
    • 2005
  • When confronted with a query, question answering systems endeavor to extract the most exact answers possible by determining the answer type that fits with the key terms used in the query. However, the efficacy of such systems is limited by the fact that the terms used in a query may be in a syntactic form different to that of the same words in a document. In this paper, we present an efficient semantic query expansion methodology based on a question category concept list comprised of terms that are semantically close to terms used in a query. The semantically close terms of a term in a query may be hypernyms, synonyms, or terms in a different syntactic category. The proposed system constructs a concept list for each question type and then builds the concept list for each question category using a learning algorithm. In the question answering experiments on 42,654 Wall Street Journal documents of the TREC collection, the traditional system showed in 0.223 in MRR and the proposed system showed 0.50 superior to the traditional question answering system. The results of the present experiments suggest the promise of the proposed method.

The MeSH-Term Query Expansion Models using LDA Topic Models in Health Information Retrieval (MeSH 기반의 LDA 토픽 모델을 이용한 검색어 확장)

  • You, Sukjin
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.1
    • /
    • pp.79-108
    • /
    • 2021
  • Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), found in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by threshold values of topic probability (TP) and word probability (WP). Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result.

A Web-document Recommending System using the Korean Thesaurus (한국어 시소러스를 이용한 웹 문서 추천 에이전트)

  • Seo, Min-Rye;Lee, Song-Wook;Seo, Jung-Yun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.1
    • /
    • pp.103-109
    • /
    • 2009
  • We build the web document recommending agent system which offers a certain amount of web documents to each user by monitoring and learning the user's action of web browsing. We also propose a method of query expansion using the Korean thesaurus. The queries to search for new web documents generate a candidate set using the Korean thesaurus. We extract the words which are mostly correlated with the queries, among the words in the candidate set, by using TF-IDF and mutual information. Then, we expand the query. If we adopt the system of query expansion, we can recommend a lot of web documents which have potential interests to users. We thus conclude that the system of query expansion is more effective than a base system of recommending web-documents to users.

An efficient spatio-temporal index for spatio-temporal query in wireless sensor networks

  • Lee, Donhee;Yoon, Kyoungro
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4908-4928
    • /
    • 2017
  • Recent research into wireless sensor network (WSN)-related technology that senses various data has recognized the need for spatio-temporal queries for searching necessary data from wireless sensor nodes. Answers to the queries are transmitted from sensor nodes, and for the efficient transmission of the sensed data to the application server, research on index processing methods that increase accuracy while reducing the energy consumption in the node and minimizing query delays has been conducted extensively. Previous research has emphasized the importance of accuracy and energy efficiency of the sensor node's routing process. In this study, we propose an itinerary-based R-tree (IR-tree) to solve the existing problems of spatial query processing methods such as efficient processing and expansion of the query to the spatio-temporal domain.