• Title/Summary/Keyword: Document information retrieval

Search Result 411, Processing Time 0.025 seconds

Service-centric Object Fragmentation Model for Efficient Retrieval and Management of XML Documents (XML 문서의 효율적인 검색과 관리를 위한 SCOF 모델)

  • Jeong, Chang-Hoo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.595-598
    • /
    • 2007
  • Vast amount of XML documents raise interests in how they will be used and how far their usage can be expanded. This paper has two central goals: 1) easy and fast retrieval of XML documents or relevant elements; and 2) efficient and stable management of large-size XML documents. The keys to develop such a practical system are how to segment a large XML document to smaller fragments and how to store them. In order to achieve these goals, we designed SCOF(Service-centric Object Fragmentation) model, which is a semi-decomposition method based on conversion rules provided by XML database managers. Keyword-based search using SCOF model then retrieves the specific elements or attributes of XML documents, just as typical XML query language does. Even though this approach needs the wisdom of managers in XML document collection, SCOF model makes it efficient both retrieval and management of massive XML documents.

  • PDF

The Project and Prospects of Old Documents Information Systems in Korea (한국 고문헌 정보시스템의 구축 및 전망)

  • Kang Soon-Ae
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.4
    • /
    • pp.83-112
    • /
    • 1997
  • The purpose of this paper Is to describe the matters to plan the best information systems in Korean old books. It analyzes: i) a range of definition of old books, ii) its characteristics and current state of processing the old documents, iii) the scope of automation and building up the library institution, iv) the construction of Korean old books Information systems, v) its case study, and vi) the evaluation and vision of system. The old document information system have been organized on the basis of library networks systems with the National Central Library as leader, its implemented system has the subsystem such as cataloging system, annotation system, full-text or image-based system, and retrieval system. In case study, it is suggested two examples which has been built in the National Central Library and Sung Kyun Kwan university. finally, it provides the evaluation criteria and vision for the library which designs the old document information systems.

  • PDF

An Ontology-based Knowledge Management System - Integrated System of Web Information Extraction and Structuring Knowledge -

  • Mima, Hideki;Matsushima, Katsumori
    • Proceedings of the CALSEC Conference
    • /
    • 2005.03a
    • /
    • pp.55-61
    • /
    • 2005
  • We will introduce a new web-based knowledge management system in progress, in which XML-based web information extraction and our structuring knowledge technologies are combined using ontology-based natural language processing. Our aim is to provide efficient access to heterogeneous information on the web, enabling users to use a wide range of textual and non textual resources, such as newspapers and databases, effortlessly to accelerate knowledge acquisition from such knowledge sources. In order to achieve the efficient knowledge management, we propose at first an XML-based Web information extraction which contains a sophisticated control language to extract data from Web pages. With using standard XML Technologies in the system, our approach can make extracting information easy because of a) detaching rules from processing, b) restricting target for processing, c) Interactive operations for developing extracting rules. Then we propose a structuring knowledge system which includes, 1) automatic term recognition, 2) domain oriented automatic term clustering, 3) similarity-based document retrieval, 4) real-time document clustering, and 5) visualization. The system supports integrating different types of databases (textual and non textual) and retrieving different types of information simultaneously. Through further explanation to the specification and the implementation technique of the system, we will demonstrate how the system can accelerate knowledge acquisition on the Web even for novice users of the field.

  • PDF

Korean Document Classification Using Extended Vector Space Model (확장된 벡터 공간 모델을 이용한 한국어 문서 분류 방안)

  • Lee, Samuel Sang-Kon
    • The KIPS Transactions:PartB
    • /
    • v.18B no.2
    • /
    • pp.93-108
    • /
    • 2011
  • We propose a extended vector space model by using ambiguous words and disambiguous words to improve the result of a Korean document classification method. In this paper we study the precision enhancement of vector space model and we propose a new axis that represents a weight value. Conventional classification methods without the weight value had some problems in vector comparison. We define a word which has same axis of the weight value as ambiguous word after calculating a mutual information value between a term and its classification field. We define a word which is disambiguous with ambiguous meaning as disambiguous word. We decide the strengthness of a disambiguous word among several words which is occurring ambiguous word and a same document. Finally, we proposed a new classification method based on extension of vector dimension with ambiguous and disambiguous words.

Semantic Topic Selection Method of Document for Classification (문서분류를 위한 의미적 주제선정방법)

  • Ko, kwang-Sup;Kim, Pan-Koo;Lee, Chang-Hoon;Hwang, Myung-Gwon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.163-172
    • /
    • 2007
  • The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.

Document Ranking of Web Document Retrieval Systems (웹 정보검색 시스템의 문서 순위 결정)

  • An, Dong-Un;Kang, In-Ho
    • Journal of Information Management
    • /
    • v.34 no.2
    • /
    • pp.55-66
    • /
    • 2003
  • The Web is rich with various sources of information. It contains the contents of documents, multimedia data, shopping materials and so on. Due to the massive and heterogeneous web document collections, users want to find various types of target pages. We can classify user queries as three categories according to users'intent, content search, the site search, and the service search. In this paper, we present that different strategies are needed to meet the need of a user. Also we show the properties of content information, link information and URL information according to the class of a user query. In the content search, content information showed the good result. However, we lost the performance by combining link information and URL information. In the site search, we could increase the performance by combining link information and URL information.

A Study on Semantic Based Indexing and Fuzzy Relevance Model (의미기반 인덱스 추출과 퍼지검색 모델에 관한 연구)

  • Kang, Bo-Yeong;Kim, Dae-Won;Gu, Sang-Ok;Lee, Sang-Jo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.238-240
    • /
    • 2002
  • If there is an Information Retrieval system which comprehends the semantic content of documents and knows the preference of users. the system can search the information better on the Internet, or improve the IR performance. Therefore we propose the IR model which combines semantic based indexing and fuzzy relevance model. In addition to the statistical approach, we chose the semantic approach in indexing, lexical chains, because we assume it would improve the performance of the index term extraction. Furthermore, we combined the semantic based indexing with the fuzzy model, which finds out the exact relevance of the user preference and index terms. The proposed system works as follows: First, the presented system indexes documents by the efficient index term extraction method using lexical chains. And then, if a user tends to retrieve the information from the indexed document collection, the extended IR model calculates and ranks the relevance of user query. user preference and index terms by some metrics. When we experimented each module, semantic based indexing and extended fuzzy model. it gave noticeable results. The combination of these modules is expected to improve the information retrieval performance.

  • PDF

Design and Implementation of a Hypermedia System for Effective Multimedia Information Retrieval (멀티미디어 정보의 효율적인 검색을 위한 하이퍼미디어 시스템의 설계와 구현)

  • 고영곤;최윤철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.8
    • /
    • pp.1213-1225
    • /
    • 1993
  • Hypermedia systems have the browsing mechanism using links and provide navigation tools to retrieve and represent multimedia information. In this study we designed and implemented a hypermedia system which has the hierarchical group and local map for effective navigation. We also propose the clustering mechanism which constructs a cluster tree and uses this knowledge for navigation. The system has been designed to integrate the browsing and searching function of the hypermedia system for efficient multimedia information retrieval and user-interface. This system can be used to develop hypermedia application systems in the area of encyclopedia, reference document information, electronic dictionary and electronic book.

  • PDF

An Index Mechanism and Structure Information for Efficient Retrieval of XML DTD (XML DTD의 효율적인 검색을 위한 구조 정보 및 인덱스 메카니즘)

  • 김영란
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.3
    • /
    • pp.80-86
    • /
    • 2003
  • XML is being watched with keen interest for the communication and saving of information. Information represented in XML provides more accuracy and a higher-speed of reference after the process of being implication. But, it is difficult that XML document is exchanged or shared in different area such as electronic commerce or digital library. Because, XML document is being different in syntax but similar in logic, with using structured difference analysis. In this thesis, we converted object-oriented class diagram to XML DTD and designed an index mechanism based on the structure information for the converted XML DTD. With our methods, we could effectively and lastly retrieve the specific element and respect to usefully access element by simple operations.

  • PDF

Measurement of Document Similarity using Term/Term-pair Features and Neural Network (단어/단어쌍 특징과 신경망을 이용한 두 문서간 유사도 측정)

  • Kim Hye Sook;Park Sang Cheol;Kim Soo Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.12
    • /
    • pp.1660-1671
    • /
    • 2004
  • This paper proposes a method for measuring document similarity between two documents. One of the most significant ideas of the method is to estimate the degree of similarity between two documents based on the frequencies of terms and term-pair, existing in both the two documents. In contrast to conventional methods which takes only one feature into account, the proposed method considers several features at the same time and meatures the similarity using a neural network. To prove the superiority of our method, two experiments have been conducted. One is to verify whether the two input documents are from the same document or not. The other is a problem of information retrieval with a document as the query against a large number of documents. In both the two experiments, the proposed method shows higher accuracy than two conventional methods, Cosine similarity measurement and a term-pair method.