• Title/Summary/Keyword: Document information retrieval

Search Result 410, Processing Time 0.033 seconds

Evaluation of the Newspaper Library -With Emphasis on the Document Delivery Capability and Retrieval Effectivenss- (신문사 자료실에 대한 평가 -문헌전달능력과 검색효율을 중심으로-)

  • 노동조
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.7 no.1
    • /
    • pp.319-351
    • /
    • 1994
  • This rearch is a case study for the newspaper libraries in Seoul and the primary purpose of the this study are to investigate its document delivery capability. To achieve the above-mentioned purpose, representative rsers visited seven the newspaper library and checked their searching time. Document delivery capability was checked by units of hour, minute, second(searching time). Retrieval effectiveness was tested through the recall ratio and the precision ratio. The major findings of the study are summarized as follows: 1) Most of the newspaper libraries excellent to the document delivery capability; 6 newspaper libraries deliverived the data related subject. 2) The newspaper libraries were came out 50.1% the mean recall ratio and 84.8% the mean precision ratio about the all materials. 3) Concerned their own articles, the newspaper libraries showed 71.4% the recall ratio and 90.0% the precision ratio. That moaned their own articles were more effectived than others. 4) The Kookmin Ilbo library had the most excellent system, and the precision ratio of The Dong-A Ilbo library prior to the recall ratio. The Han Kyoreh Shinmun library had a excellent arragement in own articles, but The Segye Times library had problem in every parties.

  • PDF

Document ranking methods using term dependencies from a thesaurus (시소러스의 연관성 정보를 이용한 문서의 순위 결정 방법)

  • 이준호
    • Journal of the Korean Society for information Management
    • /
    • v.10 no.2
    • /
    • pp.3-22
    • /
    • 1993
  • In recent years various document ranking methods such as Relevance. R-Distance and K-Distance have been developed wh~ch can be used in thesaurus-based boolean retrieval systems. They give high quality document rankings in many cases by using term dependence lnformatlon from a thesaurus. However, they suffer from several problems resulting from inefficient and Ineffective evaluation of boolean operators AND. OR and NOT. In this paper we propose new thesaurus-based document ranking methods called KB-FSM and KB-EBM by exploitmg the enhanced fuzzy set model and the extended boolean model. The proposed methods overcome the problems of the previous methods and use term dependencies from a thesaurs effectively. We also show through performance comparison that KB-FSM and KBEBM provide higher retrieval effectiveness than Relevance. R-D~stance and K-Distance.

  • PDF

A Storage and Retrieval System for Structured SGML Documents using Grove (Grove를 이용한 구조적 SGML문서의 저장 및 검색)

  • Kim, Hak-Gyoon;Cho, Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.5
    • /
    • pp.501-509
    • /
    • 2002
  • SGML(ISO 8879) has been proliferated to support various document styles and to transfer documents into different platforms. SGML documents have logical structure information in addition to contents. As SGML documents are widely used, there is an increasing need for database storage and retrieval system using the logical structure of documents. However. traditional search engines using document indexes cannot exploit the logical structure. In this Paper, we have developed an SGML document storage system, which is DTD-independent and store the document type and the document instance separately by using Grove which is the document model for DSSSL and HyTime. We have used the Object Store, an object-oriented DBMS, to store the structure information appropriately without any loss of structural information. Also, we have supported a index structure for search efficiency like the relational DBMS, and constructed an effective user interface which combines content-based search with structure-based search.

A Re-Ranking Retrieval Model based on Two-Level Similarity Relation Matrices (2단계 유사관계 행렬을 기반으로 한 순위 재조정 검색 모델)

  • 이기영;은희주;김용성
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1519-1533
    • /
    • 2004
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. In this paper, we apply the fuzzy retrieval model to solve the high time complexity of the retrieval system by constructing a reduced term set for the term's relatively importance degree. Furthermore, we perform a cluster retrieval to reflect the user's Query exactly through the similarity relation matrix satisfying the characteristics of the fuzzy compatibility relation. We have proven the performance of a proposed re-ranking model based on the similarity union of the fuzzy retrieval model and the document cluster retrieval model.

A Database Approach for Modeling and Querying XML Documents

  • Panseop Shin;Kim, Jeong-Eun;Lee, Jaeho;Haechull Lim
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.703-706
    • /
    • 2000
  • In recent years. XML applications are being developed in diverse area. Especially, development of XML document repository system associated with database is carrying out widely. The previous researches of XML repository system have several defects which are update and retrieval limitations for the XML document, design limitation for a formal retrieval algorithm and data redundancy. In order to solve the above problems. in this paper, we suggest relational database schemes for overcoming limitations of updating, retrieving, and rebuilding document. And suggest query translation strategy using two-phase translation that consists of pattern analyzing phase and SQL generating phase.

  • PDF

A Automatic Document Summarization Method based on Principal Component Analysis

  • Kim, Min-Soo;Lee, Chang-Beom;Baek, Jang-Sun;Lee, Guee-Sang;Park, Hyuk-Ro
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.491-503
    • /
    • 2002
  • In this paper, we propose a automatic document summarization method based on Principal Component Analysis(PCA) which is one of the multivariate statistical methods. After extracting thematic words using PCA, we select the statements containing the respective extracted thematic words, and make the document summary with them. Experimental results using newspaper articles show that the proposed method is superior to the method using either word frequency or information retrieval thesaurus.

Implementing and Evaluating an Empirical Variable Retrieval System : The Entity-Relationship and Relational Approach (실험변수를 이용한 정보검색 시스템의 구축 및 평가 : 개체-관계 모델과 관계형 데이터베이스를 이용한 접근)

  • Oh Sam-Gyun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.4
    • /
    • pp.53-67
    • /
    • 1998
  • This article investigates the potentialities of using empirical variables and their associated statistical relationships in document representation and retrieval. To this end, a newly devised empirical fact retrieval system was evaluated in comparison to a simulated traditional retrieval system involving a set of predetermined empirical queries. Results indicate that the EFRS generally outperformed the TRS in terms of the precision, search effort, and measures of user satisfaction.

  • PDF

Interactive Searching Behavior with Elements-Based on XML Documents Retrieval System (엘리먼트 기반 XML 검색 시스템에서의 이용자의 정보 탐색 행태 연구)

  • Jung, Young-Mi
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.4
    • /
    • pp.159-176
    • /
    • 2009
  • The aim of this study was to investigate the users' behaviour when interacting with elements based on XML documents retrieval system and develop approaches for XML retrieval which are effective in user-based environment. We followed the experimental guidelines from the INEX 2006 iTrack organizers. For the research goals, 16 responses from the questionnaires per subject and system logs were collected and analyzed using Excel and SPSS 17.0.

  • PDF

Query Formulation for Heuristic Retrieval in Obfuscated and Translated Partially Derived Text

  • Kumar, Aarti;Das, Sujoy
    • Journal of Information Science Theory and Practice
    • /
    • v.3 no.1
    • /
    • pp.24-39
    • /
    • 2015
  • Pre-retrieval query formulation is an important step for identifying local text reuse. Local reuse with high obfuscation, paraphrasing, and translation poses a challenge of finding the reused text in a document. In this paper, three pre-retrieval query formulation strategies for heuristic retrieval in case of low obfuscated, high obfuscated, and translated text are studied. The strategies used are (a) Query formulation using proper nouns; (b) Query formulation using unique words (Hapax); and (c) Query formulation using most frequent words. Whereas in case of low and high obfuscation and simulated paraphrasing, keywords with Hapax proved to be slightly more efficient, initial results indicate that the simple strategy of query formulation using proper nouns gives promising results and may prove better in reducing the size of the corpus for post processing, for identifying local text reuse in case of obfuscated and translated text reuse.

Document Clustering Method using Coherence of Cluster and Non-negative Matrix Factorization (비음수 행렬 분해와 군집의 응집도를 이용한 문서군집)

  • Kim, Chul-Won;Park, Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.12
    • /
    • pp.2603-2608
    • /
    • 2009
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the clustering method based NMF(non-negative matrix factorization) and refinement of documents in cluster by using coherence of cluster. The proposed method can improve the quality of document clustering because the re-assigned documents in cluster by using coherence of cluster based similarity between documents, the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.