• Title/Summary/Keyword: Language-Based Retrieval Model

Search Result 73, Processing Time 0.021 seconds

Language Modeling Approaches to Information Retrieval

  • Banerjee, Protima;Han, Hyo-Il
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.3
    • /
    • pp.143-164
    • /
    • 2009
  • This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal is to model that process via a generative statistical model. In this article, we discuss current research in the application of language modeling to information retrieval, the role of semantics in the language modeling framework, cluster-based language models, use of language modeling for XML retrieval and future trends.

Dependency Structure Applied to Language Modeling for Information Retrieval

  • Lee, Chang-Ki;Lee, Gary Geun-Bae;Jang, Myung-Gil
    • ETRI Journal
    • /
    • v.28 no.3
    • /
    • pp.337-346
    • /
    • 2006
  • In this paper, we propose a new language model, namely, a dependency structure language model, for information retrieval to compensate for the weaknesses of unigram and bigram language models. The dependency structure language model is based on the first-order dependency model and the dependency parse tree generated by a linguistic parser. So, long-distance dependencies can be naturally captured by the dependency structure language model. We carried out extensive experiments to verify the proposed model, where the dependency structure model gives a better performance than recently proposed language models and the Okapi BM25 method, and the dependency structure is more effective than unigram and bigram in language modeling for information retrieval.

  • PDF

A Experimental Study on the Usefulness of Structure Hints in the Leaf Node Language Model-Based XML Document Retrieval (단말노드 언어모델 기반의 XML문서검색에서 구조 제한의 유용성에 관한 실험적 연구)

  • Jung, Young-Mi
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.209-226
    • /
    • 2007
  • XML documents format on the Web provides a mechanism to impose their content and logical structure information. Therefore, an XML processor provides access to their content and structure. The purpose of this study is to investigate the usefulness of structural hints in the leaf node language model-based XML document retrieval. In order to this purpose, this experiment tested the performances of the leaf node language model-based XML retrieval system to compare the queries for a topic containing only content-only constraints and both content constrains and structure constraints. A newly designed and implemented leaf node language model-based XML retrieval system was used. And we participated in the ad-hoc track of INEX 2005 and conducted an experiment using a large-scale XML test collection provided by INEX 2005.

An Experimental Study on the Performance of Element-based XML Document Retrieval (엘리먼트 기반 XML 문서검색의 성능에 관한 실험적 연구)

  • Yoon, So-Young;Moon, Sung-Been
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.201-219
    • /
    • 2006
  • This experimental study suggests an element-based XML document retrieval method that reveals highly relevant elements. The models investigated here for comparison are divergence and smoothing method, and hierarchical language model. In conclusion, the hierarchical language model proved to be most effective in element-based XML document retrieval with regard to the improved exhaustivity and harmed specificity.

Retrieval Model Based on Word Translation Probabilities and the Degree of Association of Query Concept (어휘 번역확률과 질의개념연관도를 반영한 검색 모델)

  • Kim, Jun-Gil;Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.3
    • /
    • pp.183-188
    • /
    • 2012
  • One of the major challenge for retrieval performance is the word mismatch between user's queries and documents in information retrieval. To solve the word mismatch problem, we propose a retrieval model based on the degree of association of query concept and word translation probabilities in translation-based model. The word translation probabilities are calculated based on the set of a sentence and its succeeding sentence pair. To validate the proposed method, we experimented on TREC AP test collection. The experimental results show that the proposed model achieved significant improvement over the language model and outperformed translation-based language model.

IFC Model Data Retrieval and Regeneration Method through Property Set-based Query Language (IFC 속성 데이터기반의 질의어 개발을 통한 모델 정보 검색 및 재생성 방안)

  • Lee, Sang-Ho;Park, Sang I.;Jang, Young-Hoon;Choi, Kyou-Won
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.2
    • /
    • pp.38-46
    • /
    • 2017
  • In this study, a query language was developed to supplement the information retrieval and model regeneration in the case of Industry Foundation Classes (IFC)-based civil infrastructure information models. First, the IFC objects to represent the structural components, entities to manage the related properties, and relationships to connect with the mentioned elements were analyzed in a point of information flow. The results confirmed that the end-users could have problems with access and comprehend the properties and its relationships in the IFC file. Second, the IfcPropertySet-focused query method and applicable stand-alone module were proposed referring to the previous Building Information Model Query Language (BimQL). The availabilities of the proposed method were examined using the rail and sleeper information models through information retrieval and model regeneration. The most important advantage of the proposed approach is the IFC-based information retrievals that can guarantee the interoperability between software packages.

Hybrid Video Information System Supporting Content-based Retrieval and Similarity Retrieval (비디오의 의미검색과 유사성검색을 위한 통합비디오정보시스템)

  • Yun, Mi-Hui;Yun, Yong-Ik;Kim, Gyo-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2031-2041
    • /
    • 1999
  • In this paper, we present the HVIS (Hybrid Video Information System) which bolsters up meaning retrieval of all the various users by integrating feature-based retrieval and annotation-based retrieval of unformatted formed and massive video data. HVIS divides a set of video into video document, sequence, scene and object to model the metadata and suggests the Two layered Hybrid Object-oriented Metadata Model(THOMM) which is composed of raw-data layer for physical video stream, metadata layer to support annotation-based retrieval, content-based retrieval, and similarity retrieval. Grounded on this model, we presents the video query language which make the annotation-based query, content-based query and similar query possible and Video Query Processor to process the query and query processing algorithm. Specially, We present the similarity expression to appear degree of similarity which considers interesting of user. The proposed system is implemented with Visual C++, ActiveX and ORACLE.

  • PDF

Design and Implementation of BADA-IV/XML Query Processor Supporting Efficient Structure Querying (효율적 구조 질의를 지원하는 바다-IV/XML 질의처리기의 설계 및 구현)

  • 이명철;김상균;손덕주;김명준;이규철
    • The Journal of Information Technology and Database
    • /
    • v.7 no.2
    • /
    • pp.17-32
    • /
    • 2000
  • As XML emerging as the Internet electronic document language standard of the next generation, the number of XML documents which contain vast amount of Information is increasing substantially through the transformation of existing documents to XML documents or the appearance of new XML documents. Consequently, XML document retrieval system becomes extremely essential for searching through a large quantity of XML documents that are storied in and managed by DBMS. In this paper we describe the design and implementation of BADA-IV/XML query processor that supports content-based, structure-based and attribute-based retrieval. We design XML query language based upon XQL (XML Query Language) of W3C and tightly-coupled with OQL (a query language for object-oriented database). XML document is stored and maintained in BADA-IV, which is an object-oriented database management system developed by ETRI (Electronics and Telecommunications Research Institute) The storage data model is based on DOM (Document Object Model), therefore the retrieval of XML documents is executed basically using DOM tree traversal. We improve the search performance using Node ID which represents node's hierarchy information in an XML document. Assuming that DOW tree is a complete k-ary tree, we show that Node ID technique is superior to DOM tree traversal from the viewpoint of node fetch counts.

  • PDF

Topic Level Disambiguation for Weak Queries

  • Zhang, Hui;Yang, Kiduk;Jacob, Elin
    • Journal of Information Science Theory and Practice
    • /
    • v.1 no.3
    • /
    • pp.33-46
    • /
    • 2013
  • Despite limited success, today's information retrieval (IR) systems are not intelligent or reliable. IR systems return poor search results when users formulate their information needs into incomplete or ambiguous queries (i.e., weak queries). Therefore, one of the main challenges in modern IR research is to provide consistent results across all queries by improving the performance on weak queries. However, existing IR approaches such as query expansion are not overly effective because they make little effort to analyze and exploit the meanings of the queries. Furthermore, word sense disambiguation approaches, which rely on textual context, are ineffective against weak queries that are typically short. Motivated by the demand for a robust IR system that can consistently provide highly accurate results, the proposed study implemented a novel topic detection that leveraged both the language model and structural knowledge of Wikipedia and systematically evaluated the effect of query disambiguation and topic-based retrieval approaches on TREC collections. The results not only confirm the effectiveness of the proposed topic detection and topic-based retrieval approaches but also demonstrate that query disambiguation does not improve IR as expected.

A Study on the Korean University Students' Usage of Foreign Language Queries in Scholarly Information Retrieval (학술정보검색을 위한 국내 대학생의 외국어 탐색문 활용에 관한 연구)

  • Lee, Bo Eun;Lee, Jee Yeon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.95-116
    • /
    • 2019
  • This study focused on understanding the Korean university students' (both undergraduates and graduates) use of foreign language for scholarly information retrieval especially in different search strategies employed based on users' characteristics. A new model was developed based on Ellis's behavioral model of information seeking strategies. The research applied both quantitative and qualitative methods to analyze the data. The students used a variety of foreign language information seeking strategies at different stages of academic information retrieval based on his/her field of study or level of education. The liberal arts and social science students had more difficulty in selecting proper search terms in the foreign language than the science and technology students. This difficulty resulted in less preference for using foreign language queries by the liberal arts and social science students. The students relied more on the bibliographic and citation information in scholarly information retrieval using foreign language queries than the Korean queries. The research outcomes should provide some guidelines on how the Korean university libraries offer information literacy programs and other services based on the patrons' characteristics.