• Title/Summary/Keyword: Document Retrieval

Search Result 447, Processing Time 0.021 seconds

Research on Function and Policy for e-Government System using Semantic Technology (전자정부내 의미기반 기술 도입에 따른 기능 및 정책 연구)

  • Go, Gwang-Seop;Jang, Yeong-Cheol;Lee, Chang-Hun
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.79-87
    • /
    • 2007
  • This paper aims to offer a solution based on semantic document classification to improve e-Government utilization and efficiency for people using their own information retrieval system and linguistic expression Generally, semantic document classification method is an approach that classifies documents based on the diverse relationships between keywords in a document without fully describing hierarchial concepts between keywords. Our approach considers the deep meanings within the context of the document and radically enhances the information retrieval performance. Concept Weight Document Classification(CoWDC) method, which goes beyond using exist ing keyword and simple thesaurus/ontology methods by fully considering the concept hierarchy of various concepts is proposed, experimented, and evaluated. With the recognition that in order to verify the superiority of the semantic retrieval technology through test results of the CoWDC and efficiently integrate it into the e-Government, creation of a thesaurus, management of the operating system, expansion of the knowledge base and improvements in search service and accuracy at the national level were needed.

  • PDF

An Experimental Study on Fuzzy Document Retrieval System (퍼지개념을 적용한 질의식의 분석과 문헌정보 검색에 관한 연구)

  • Lee Seung Chai
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.21
    • /
    • pp.249-290
    • /
    • 1991
  • Theoretical developments in the information retrieval have offered a number of alternatives to traditional Boolean retrieval. Probability theory and fuzzy set theory have played prominent roles here. Fuzzy set theory is an attempt to generalize traditional set theory by permitting partial membership in a set and this means recognizing different degrees to which a document can match a request. In this study, an experimentation of a document retrieval system using the fuzzy relation matrix of the keywords is described and the results are offered. The queries composed of keywords and Boolean operaters AND, OR, NOT were processed in the retrieval method, and the method was implemented on the PC of 32bit level (30 MHz) in an experimental system. The measurement of the recall ratio and precision ratio verified the effectiveness of the proposed fuzzy relation matrix of keywords and retrieval method. Compared to traditional crisp method in the same document database, the recall ratio increased $10\%$ high although the precision ratio decreased slightly. The problems, in this experiment, to be resolved are first, the design of the automatic data input and fuzzy indexing modules, through which the system . can have the ability of competition and usefulness. Second, devising a systematic procedure for assigning fuzzy weights to keywords in documents and in queries.

  • PDF

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents (비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현)

  • Jo, Dae Woong;Choi, Ji Woong;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.23-33
    • /
    • 2014
  • The development of the information retrieval field is evolving to the research field searching accurately for the information from thing finding rapidly a large amount of information. Personalization and the semantic web technology is a key technology. The automatic indexing technology about the web document and throughput go beyond the research stage and show up as the practical service. However, there is a lack of research on the document information retrieval field about the attached document type of except the web document. In this paper, we illustrate about the method in which it analyzed the text content of the unstructured documents prepared in the text, word, hwp form and it how to construction OWL ontology. To build TBox of the document ontology and the resources which can be obtained from the document is selected, and we implement with the system in order to utilize as the instant of the constructed document ontology. It is effectually usable in the information retrieval and document management system using the semantic technology of the correspondence document as the ontology automatic construction of this kind of the unstructured documents.

An Experimental Study on the Performance of Element-based XML Document Retrieval (엘리먼트 기반 XML 문서검색의 성능에 관한 실험적 연구)

  • Yoon, So-Young;Moon, Sung-Been
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.201-219
    • /
    • 2006
  • This experimental study suggests an element-based XML document retrieval method that reveals highly relevant elements. The models investigated here for comparison are divergence and smoothing method, and hierarchical language model. In conclusion, the hierarchical language model proved to be most effective in element-based XML document retrieval with regard to the improved exhaustivity and harmed specificity.

A Hangul Document Image Retrieval System Using Rank-based Recognition (웨이브렛 특징과 순위 기반 인식을 이용한 한글 문서 영상 검색 시스템)

  • Lee Duk-Ryong;Kim Woo-Youn;Oh Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.2
    • /
    • pp.229-242
    • /
    • 2005
  • We constructed a full-text retrieval system for the scanned Hangul document images. The system consists of three parts; preprocessing, recognition, and retrieval components. The retrieval algorithm uses recognition results up to k-ranks. The algorithm is not only insensitive to the recognition errors, but also has the advantage of user-controllable recall and precision. For the objective performance evaluation, we used the scanned images of the Journal of Korea Information Science Society provided by KISTI. The system was shown to be practical through theevaluationofrecognitionandretrievalrates.

  • PDF

A Study on the Effect of Data Fusion on the Retrieval Effectiveness of Web Documents (데이터 결합이 웹 문서 검색성능에 미치는 영향 연구)

  • Park, Ok-Hwa;Chung, Young-Mee
    • Journal of Information Management
    • /
    • v.38 no.1
    • /
    • pp.1-19
    • /
    • 2007
  • This study investigates the effect of data fusion on the retrieval effectiveness by performing an experiment combining multiple representations of Web documents. The types of document representation combined in the study include content terms, links, anchor text, and URL. The experimental results showed that the data fusion technique combining document representation methods in Web environment did not bring any significant improvement in retrieval effectiveness.

Document Retrieval using Concept Network (개념 네트워크를 이용한 정보 검색 방법)

  • Hur, Won-Chang;Lee, Sang-Jin
    • Asia pacific journal of information systems
    • /
    • v.16 no.4
    • /
    • pp.203-215
    • /
    • 2006
  • The advent of KM(knowledge management) concept have led many organizations to seek an effective way to make use of their knowledge. But the absence of right tools for systematic handling of unstructured information makes it difficult to automatically retrieve and share relevant information that exactly meet user's needs. we propose a systematic method to enable content-based information retrieval from corpus of unstructured documents. In our method, a document is represented by using several key terms which are automatically selected based on their quantitative relevancy to the document. Basically, the relevancy is calculated by using a traditional TFIDF measure that are widely accepted in the related research, but to improve effectiveness of the measure, we exploited 'concept network' that represents term-term relationships. In particular, in constructing the concept network, we have also considered relative position of terms occurring in a document. A prototype system for experiment has been implemented. The experiment result shows that our approach can have higher performance over the conventional TFIDF method.

Expected Matching Score Based Document Expansion for Fast Spoken Document Retrieval (고속 음성 문서 검색을 위한 Expected Matching Score 기반의 문서 확장 기법)

  • Seo, Min-Koo;Jung, Gue-Jun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.71-74
    • /
    • 2006
  • Many works have been done in the field of retrieving audio segments that contain human speeches without captions. To retrieve newly coined words and proper nouns, subwords were commonly used as indexing units in conjunction with query or document expansion. Among them, document expansion with subwords has serious drawback of large computation overhead. Therefore, in this paper, we propose Expected Matching Score based document expansion that effectively reduces computational overhead without much loss in retrieval precisions. Experiments have shown 13.9 times of speed up at the loss of 0.2% in the retrieval precision.

  • PDF

Neural Net Based User Feedback Learning Mechanism for Distributed Information Retrieval (분산 정보 검색을 위한 신경망 기반 사용자 피드백 학습 메카니즘)

  • Choi, Yong S.
    • The Journal of Korean Association of Computer Education
    • /
    • v.4 no.2
    • /
    • pp.85-95
    • /
    • 2001
  • Since documents on the Web are naturally partitioned into many document databases, the efficient information retrieval process requires identifying the document databases that are most likely to provide relevant documents to the query and then querying the identified document databases. We propose a neural net based user feedback learning mechanism for such an efficient information retrieval. Presented learning mechanism learns about underlying document databases using the relevance feedbacks obtained from user's retrieval experiences. For a given query, the learning mechanism, which is sufficiently trained, discovers the document databases associated with the relevant documents and retrieves those documents effectively.

  • PDF

An Indexing Scheme for Efficient Retrieval and Update of Structured Documents Based on GDIT (GDIT를 기반으로 한 구조적 문서의 효율적 검색과 갱신을 위한 인덱스 설계)

  • Kim, Young-Ja;Bae, Jong-Min
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.2
    • /
    • pp.411-425
    • /
    • 2000
  • Information retrieval systems for structured documents which are written in SGML or XML support partial retrieval of document. In order to efficiently process queries based on document structures, low memory overhead for indexing, quick response time for queries, supports to powerful types of user queries, and minimal updates of index structure for document updates are required. This paper suggests the Global Document Instance Tree(GDIT) and proposes an effective indexing scheme and query processing algorithms based on the GDIT. The indexing scheme keeps up indexing and retrieval effciency and also guarantees minimal updates of the index structure when document structures are updated.

  • PDF