• Title/Summary/Keyword: Document information retrieval

Search Result 410, Processing Time 0.033 seconds

The Document Clustering using LSI of IR (LSI를 이용한 문서 클러스터링)

  • 고지현;최영란;유준현;박순철
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2002.06a
    • /
    • pp.330-335
    • /
    • 2002
  • The most critical issue in information retrieval system is to have adequate results corresponding to user requests. When all documents related with user inquiry retrieve, it is not easy not only to find correct document what user wants but is limited. Therefore, clustering method that grouped by corresponding documents has widely used so far. In this paper, we cluster on the basis of the meaning rather than the index term in the existing document and a LSI method is applied by this reason. Furthermore, we distinguish and analyze differences from the clustering using widely-used K-Means algorithm for the document clustering.

  • PDF

Improving the Performance of the User Creative Contents Retrieval Using Content Reputation and User Reputation (콘텐츠 명성 및 사용자 명성 평가를 이용한 UCC 검색 품질 개선)

  • Bae, Won-Sik;Cha, Jeong-Won
    • Journal of the Korea Society for Simulation
    • /
    • v.19 no.1
    • /
    • pp.83-90
    • /
    • 2010
  • We describe a novel method for improving the performance of the UCC retrieval using content reputation and user reputation. The UCC retrieval is a part of the information retrieval. The goal of the information retrieval system finds documents what users want, so the goal of the UCC retrieval system tries to find UCCs themselves instead of documents. Unlike the document, the UCC has not enough textual information. Therefore, we try to use the content reputation and the user reputation based on non-textual information to gain improved retrieval performance. We evaluate content reputation using the information of the UCC itself and social activities between users related with UCCs. We evaluate user reputation using individual social activities between users or users and UCCs. We build a network with users and UCCs from social activities, and then we can get the user reputation from the network by graph algorithms. We collect the information of users and UCCs from YouTube and implement two systems using content reputation and user reputation. And then we compare two systems. From the experiment results, we can see that the system using content reputation outperforms than the system using user reputation. This result is expected to use the UCC retrieval in the feature.

A Design and Implementation for Data Sharing Interface in based XML (XML 기반 데이터 공유 Interface 설계 및 구현)

  • 김철원;김상영;박종훈
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2004.05b
    • /
    • pp.424-428
    • /
    • 2004
  • Study related to a system that saves a n document, and to search is consisting actively and has a lot of cases to have left emphasis in the function that these systems efficiently save a XML document and can search. Also, It has a table or the storage structure which was especially designed in order to save a XML document and can save structure information of a document in addition to contents of a XML document together and can efficiently do content retrieval of a XML document or a structure search with an early base. As for this paper, a design implemented the data which the many different kinds of database that had currently used had with you so that did recycling and shared conversion and this XML file in Web, and output can become a XML format through various interrace.

  • PDF

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

A study on evaluation of information retrieval system (정보검색(情報檢索)시스템의 평가(評価)에 관한 연구(硏究))

  • Park, In-Ung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.5 no.1
    • /
    • pp.85-105
    • /
    • 1981
  • Information is an essential factor leading the rapid progress which is one of the distinguished characteristics in modem society. As more information is required and as more is supplied by individuals, governmental units, businesses, and educational institutions, the greater will be the requirement for efficient methods of communication. One possibility for improving the information dissemination process is to use computers. The capabilities of such machine are beginning to be used in the process of Information storage, retrieval and dissemination. An important problems, that must be carefully examined is whether one technique for information retrieval is better for worse than another. This paper examines problem of how to evaluate an information retrieval system. One specific approach is a cost accounting model for use in studying how to minimize the cost of operating a mechanized retrieval system. Through the use of cost analysis, the model provides a method for comparative evaluation between systems. The general cost accounting model of the literature retrieval system being designed by this study are given below. 1. The total cost accounting model of the literature retrieval system. The total cost of the literature retrieval system = (the cost per unit of user time X the amount of user time) + ( the cost per unit of system time X the amount of system time) 2. System cost accounting model system cost = (the pre-search system cost per unit of time X time) + (the search system cost per unit of time X time) + (the post search system cost per unit of time X time) 1) Pre-search system cost per unit of time = cost of channel per unit time + cost of central processing unit per unit time + cost of storage per unit time 2) Search system cost per unit of time = comparison cost + document representation cost. 3) Post-search system cost per unit of time. = cost of channel per unit time + cost of central processing unit per unit time + cost of storage per unit time 3. User cost accounting model Total user cost = [pre-search user cost per unit of time X (time + additional time) ] + [search user cost per unit of time X (time + additional time) ] + [post-search user cost per unit of time X (time + additional time) ].

  • PDF

Applying document routing mode of information access in nursing diagnosis process (문서 라우팅 기법을 이용한 간호진단 과정에서의 정보접근)

  • Paik Woo-Jin
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2006.08a
    • /
    • pp.163-168
    • /
    • 2006
  • Nursing diagnosis process is described as nurses assessing the patients' conditions by applying reasoning and looking for patterns, which fit the defining characteristics of one or more diagnoses. This process is similar to using a typical document retrieval system if we consider the patients' conditions as queries, nursing diagnoses as documents, and the defining characteristics as index terms of the documents. However, there is a small fixed number of nursing diagnoses and infinite number of patients' conditions in a typical hospital setting. This state is more suitable to applying document routing mode of information access, which is defined as a number of archived profiles, compared to individual documents. In this paper, we describe a ROUting-based Nursing Diagnosis (ROUND) system and its Natural Language Processing-based query processing component, which converts the defining characteristics of nursing diagnoses into query representations.

  • PDF

Resampling Feedback Documents Using Overlapping Clusters (중첩 클러스터를 이용한 피드백 문서의 재샘플링 기법)

  • Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.247-256
    • /
    • 2009
  • Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

A Study on Hypermedia Document Retrieval System Using HyTime (HyTime을 이용한 하이퍼미디어 문헌 검색시스템 연구)

  • 정혜욱;문성빈
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1997.08a
    • /
    • pp.83-86
    • /
    • 1997
  • 본 연구에서는 하이퍼미디어 및 시간 종속적 문헌의 논리적 구조를 표현하기 위한 국제표준인 HyTime을 이용하여 하이퍼미디어 문헌을 구조화하고 그 구조정보를 기반으로 키워드검색을 지원하는 하이퍼미디어 문헌 검색시스템을 구현하였다.

  • PDF

Document Structuring and Text Retrieval Using SGML, (SGML을 이용한 문헌의 구조화 및 텍스트 검색에 관한 연구)

  • 오민경;정영미
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1995.08a
    • /
    • pp.29-32
    • /
    • 1995
  • 본 논문에서는 SGML(Standard Generalized Markup Language)을 사용하여 텍스트 검색시스템을 구축하였다. SGML은 개괄적 마크업언어로서 문헌을 문헌요소라는 객체 단위로 이루어진 것으로 보고 이러한 문헌요소간의 관계를 표현하므로, 텍스트 검색시스템에서 SGML을 이용하면 문헌을 구조화할 수 있고 전문(full text)을 효율적으로 조직하고 검색하는 것이 가능하다.

  • PDF