• Title/Summary/Keyword: Document Retrieval

Search Result 450, Processing Time 0.026 seconds

Designing Requisite Techniques of Storage Structuresupporting Efficient Retrieval in Semantic Web (시멘틱 웹의 효율적 검색을 지원하는 저장 구조의 요소 기술 설계)

  • Shin Pan-Seop
    • Journal of the Korea Computer Industry Society
    • /
    • v.7 no.3
    • /
    • pp.227-236
    • /
    • 2006
  • Semantic Web is getting popular to next web environment. Additionally, ontology language research is also activating to represent semantic relation of resource in semantic web. Specially, Ontology language as RDF and DAML+OIL appear on start point of research. But Ontology Language limited to describing characters of resource and to making a clear definition of relation of resource. So W3C suggest OWL at the next standard language for describing resource. OWL supply the lack of representation for RDF and RDF Schema. In this paper, we make Ontology to implement Online Retrieval System using OWL and propose the structure of storing Ontology document at the RDB. The structure support characters of OWL that are equivalent relationship, heterogeneous relationship, inverse relationship, union relationship and one of relationship between classes or properties. In this paper, we classify the extended elements for OWL from RDF Schema. And we propose the method of storing OWL using RDB for interoperability with many applications based on RDB. Finally, implement the storage and retrieval system based on OWL to provide advanced search function.

  • PDF

A Conceptual Framework for an Information Behavior Model Based on the Collaboration Perspective between User and System for Information Retrieval

  • Yangyuen, Wachira;Phetkaew, Thimaporn;Nuntapichai, Siwanath
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.3
    • /
    • pp.30-46
    • /
    • 2020
  • This research aimed (1) to study and analyze the ability of current information retrieval (IR) systems based on views of information behavior (IB), and (2) to propose a conceptual framework for an IB model based on the collaboration between the system and user, with the intent of developing an IR system that can apply intelligent techniques to enhance system efficiency. The methods in this study consisted of (1) document analysis which included studying the characteristics and efficiencies of the current IR systems and studying the IB models in the digital environment, and (2) implementation of the Delphi technique through an indepth interview method with experts. The research results were presented in three main parts. First, the IB model was categorized into eight stages, different from traditional IB, in the digital environment, which can correspond to all behaviors and be applied to with an IR system. Second, insufficient functions and log file storage hinder the system from effectively understanding and accommodating user behavior in the digital environment. Last, the proposed conceptual framework illustrated that there are stages that can add intelligent techniques to the IR system based on the collaboration perspective between the user and system to boost the users' cognitive ability and make the IR system more user-friendly. Importantly, the conceptual framework for the IB model based on the collaboration perspective between the user and system for IR assisted the ability of information systems to learn, recognize, and comprehend human IB according to individual characteristics, leading to enhancement of interaction between the system and users.

Design and Implementation of Web-based Retrieval System for Massive Image Contents in Green Computing Environment (그린 환경을 위한 웹기반 대용량 이미지 콘텐츠 검색 시스템 설계 및 구현)

  • Na, Moon-Sung;Lee, Jae-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.5
    • /
    • pp.113-123
    • /
    • 2009
  • As environmental issues are emerging, many efforts are globally conducted to reduce waste of energies and resources for green growth, as well as low-carbon emitting and replacement of document papers with digital files and images. On the other hand, it may require much time and efforts for users to find the proper image files on the web, where enormous un-standardized digital files are flourishing. Therefore, power and resource consumption may also grow up again in searching and retrieving files. This paper suggests efficient system design and implementation for fast and precise massive image contents retrieval for saving the energies and resources. Eventually it will contribute to green growth in computing environment.

Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm (K-means 알고리즘을 이용한 계층적 클러스터링에서의 클러스터 계층 깊이 선택)

  • Lee, Won-Hee;Lee, Shin-Won;Chung, Sung-Jong;An, Dong-Un
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.150-156
    • /
    • 2008
  • Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means reduces a time complexity. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

Design and Implementation of a Query Processor for Document Management Systems (문서관리시스템을 위한 질의처리기 설계 및 구현)

  • U, Jong-Won;Yun, Seung-Hyeon;Yu, Jae-Su
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.6
    • /
    • pp.1419-1432
    • /
    • 1999
  • The Document Management System(DMS) is a system which retrieves and manages library information efficiently. Since DMS manages the information using only one table, it does not need to provide join and view operations that spend high cost in traditional DBMS. In addition, DMs requires new operations because of their property. the operation has not been supported in existing DBMSs. In this paper we define a data language which represents the structure definition and process of data on the DMS. Especially we define Ranking and Proximity operation which is needed in Document Retrieval,. We also design and implement a query processor to process the query constructed with the data language. When the exiting query processors of relational DBMS are used as a query processor of DMS, they degrade the whole system performance. The proposed query processor not only overcomes such a problem but also supports new operation which is needed in DMS.

  • PDF

Design and Implementation of on XML Data Encryption System considering Validation (유효성을 고려한 XML 데이타 암호화 시스템의 설계 및 구현)

  • 남궁영환;박대하;허승호;백두권
    • Journal of KIISE:Databases
    • /
    • v.29 no.6
    • /
    • pp.417-428
    • /
    • 2002
  • XML(extensible Markup Language) is effective to information retrieval and sharing but has defects related to the data security. And, as a solution of this problem, the current XML security researches such as XML digital signature, XML data encryption, and XML access control exclude the validation property of XML document. The validation of XML should be considered for the secure information sharing in the XML-based environment. In this paper, we design and implement the system to support both security and validation to XML document. Our system performs data encryption and maintenance of valid status of XML document by referencing new XML schema namespace. In addition, it also provides the XML schema security function through the XML schema digital signature. During generating XML schema digital signature, DOMHash method which has the advantage of the faster speed than canonical XML method is applied to XML schema. In conclusion, our system shows the improved functions in flexibility, scalability, and reliability compared with the existing XML security researches.

Document Ranking of Web Document Retrieval Systems (웹 정보검색 시스템의 문서 순위 결정)

  • An, Dong-Un;Kang, In-Ho
    • Journal of Information Management
    • /
    • v.34 no.2
    • /
    • pp.55-66
    • /
    • 2003
  • The Web is rich with various sources of information. It contains the contents of documents, multimedia data, shopping materials and so on. Due to the massive and heterogeneous web document collections, users want to find various types of target pages. We can classify user queries as three categories according to users'intent, content search, the site search, and the service search. In this paper, we present that different strategies are needed to meet the need of a user. Also we show the properties of content information, link information and URL information according to the class of a user query. In the content search, content information showed the good result. However, we lost the performance by combining link information and URL information. In the site search, we could increase the performance by combining link information and URL information.

An XML Tag Indexing Method Using on Lexical Similarity (XML 태그를 분류에 따른 가중치 결정)

  • Jeong, Hye-Jin;Kim, Yong-Sung
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.71-78
    • /
    • 2009
  • For more effective index extraction and index weight determination, studies of extracting indices are carried out by using document content as well as structure. However, most of studies are concentrating in calculating the importance of context rather than that of XML tag. These conventional studies determine its importance from the aspect of common sense rather than verifying that through an objective experiment. This paper, for the automatic indexing by using the tag information of XML document that has taken its place as the standard for web document management, classifies major tags of constructing a paper according to its importance and calculates the term weight extracted from the tag of low weight. By using the weight obtained, this paper proposes a method of calculating the final weight while updating the term weight extracted from the tag of high weight. In order to determine more objective weight, this paper tests the tag that user considers as important and reflects it in calculating the weight by classifying its importance according to the result. Then by comparing with the search performance while using the index weight calculated by applying a method of determining existing tag importance, it verifies effectiveness of the index weight calculated by applying the method proposed in this paper.

Latent Semantic Indexing Analysis of K-Means Document Clustering for Changing Index Terms Weighting (색인어 가중치 부여 방법에 따른 K-Means 문서 클러스터링의 LSI 분석)

  • Oh, Hyung-Jin;Go, Ji-Hyun;An, Dong-Un;Park, Soon-Chul
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.735-742
    • /
    • 2003
  • In the information retrieval system, document clustering technique is to provide user convenience and visual effects by rearranging documents according to the specific topics from the retrieved ones. In this paper, we clustered documents using K-Means algorithm and present the effect of index terms weighting scheme on the document clustering. To verify the experiment, we applied Latent Semantic Indexing approach to illustrate the clustering results and analyzed the clustering results in 2-dimensional space. Experimental results showed that in case of applying local weighting, global weighting and normalization factor, the density of clustering is higher than those of similar or same weighting schemes in 2-dimensional space. Especially, the logarithm of local and global weighting is noticeable.

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity (문장-질의 유사성을 이용한 웹 정보 검색의 성능 향상)

  • Park Eui-Kyu;Ra Dong-Yul;Jang Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.406-415
    • /
    • 2005
  • Prosperity of Internet led to the web containing huge number of documents. Thus increasing importance is given to the web information retrieval technology that can provide users with documents that contain the right information they want. This paper proposes several techniques that are effective for the improvement of web information retrieval. Similarity between a document and the query is a major source of information exploited by conventional systems. However, we suggest a technique to make use of similarity between a sentence and the query. We introduce a technique to compute the approximate score of the sentence-query similarity even without a mature technology of natural language processing. It was shown that the amount of computation for this task is linear to the number of documents in the total collection, which implies that practical systems can make use of this technique. The next important technique proposed in this paper is to use stratification of documents in re-ranking the documents to output. It was shown that it can lead to significant improvement in performance. We furthermore showed that using hyper links, anchor texts, and titles can result in enhancement of performance. To justify the proposed techniques we developed a large scale web information retrieval system and used it for experiments.