• Title/Summary/Keyword: Document Search

Search Result 382, Processing Time 0.027 seconds

Document Clustering Technique by Domain Ontology (도메인 온톨로지에 의한 문서 군집화 기법)

  • Kim, Woosaeng;Guan, Xiang-Dong
    • Journal of Information Technology Applications and Management
    • /
    • v.23 no.2
    • /
    • pp.143-152
    • /
    • 2016
  • We can organize, manage, search, and process the documents efficiently by a document clustering. In general, the documents are clustered in a high dimensional feature space because the documents consist of many terms. In this paper, we propose a new method to cluster the documents efficiently in a low dimensional feature space by finding the core concepts from a domain ontology corresponding to the particular area documents. The experiment shows that our clustering method has a good performance.

Information Retrieval System for Mobile Devices (모바일 기기를 위한 정보검색 시스템)

  • Kim, Jae-Hoon;Kim, Hyung-Chul
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.33 no.4
    • /
    • pp.569-577
    • /
    • 2009
  • Mobile information retrieval is an evolving branch of information retrieval that is centered on mobile and ubiquitous environments. In general, mobile devices are characterized by lightweight, low power, small memory, small display, limited input/output, low bandwidth, and so on. Some of these characteristics make it impossible to apply general information retrieval to mobile environments without any modification. In order to relieve this problem, we design and implement an information retrieval system for mobile devices like wireless phones, PDA and handheld devices. We use document summarization techniques to alleviate the limitation of small display and user profiles to retrieve the most proper documents for each individual user for personalized search. Futhermore we use meta-search to lighten some burdens visiting several portal sites. In this paper, we have implemented and demonstrated the proposed mobile information retrieval system on the domain of travel and received good evaluation from users subjectively.

A Study on Compensation Management Geographic Information System Construction Using Cadastral Information (지적정보를 활용한 보상관리 지리정보시스템 구축에 관한 연구)

  • 심정민;이창경
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2004.11a
    • /
    • pp.479-484
    • /
    • 2004
  • It is the present situation that the relevant data with the compensation and payment is filed and managed in the form of document or Excell. In case of the large volume dam construction, the management of the data about the sinking areas is conducted inefficiently in view of time management and economy through using the administrative manpower and data formating manpower. There is also problem about where and how to keep the data owing to the enormous data. As a result of the raised questions, through constructing the system which is linked with the location information and property information which is relevant to the compensation to increase the applying value of the information about the compensation and apply document management system to the geographic information system, the management system through computerizing for searching the uncompensated areas in appropriate areas and compensation information in the compensated areas is projected. Through constructing geographic information system, it is expected to create various information effects, using the function of the necessary area search for the boundary survey, actual condition survey, uncompensated area search, and re-compensated area search.

  • PDF

Development of Similarity-Based Document Clustering System (유사성 계수에 의한 문서 클러스터링 시스템 개발)

  • Woo Hoon-Shik;Yim Dong-Soon
    • Proceedings of the Society of Korea Industrial and System Engineering Conference
    • /
    • 2002.05a
    • /
    • pp.119-124
    • /
    • 2002
  • Clustering of data is of a great interest in many data mining applications. In the field of document clustering, a document is represented as a data in a high dimensional space. Therefore, the document clustering can be accomplished with a general data clustering techniques. In this paper, we introduce a document clustering system based on similarity among documents. The developed system consists of three functions: 1) gatherings documents utilizing a search agent; 2) determining similarity coefficients between any two documents from term frequencies; 3) clustering documents with similarity coefficients. Especially, the document clustering is accomplished by a hybrid algorithm utilizing genetic and K-Means methods.

  • PDF

A Study on the Performance of Structured Document Retrieval Using Node Information (노드정보를 이용한 문서검색의 성능에 관한 연구)

  • Yoon, So-Young
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.103-120
    • /
    • 2007
  • Node is the semantic unit and a part of structured document. Information retrieval from structured documents offers an opportunity to go subdivided below the document level in search of relevant information, making any element in an structured document a retrievable unit. The node-based document retrieval constitutes several similarity calculating methods and the extended node retrieval method using structure information. Retrieval performance is hardly influenced by the methods for determining document similarity The extended node method outperformed the others as a whole.

Research on Function and Policy for e-Government System using Semantic Technology (전자정부내 의미기반 기술 도입에 따른 기능 및 정책 연구)

  • Go, Gwang-Seop;Jang, Yeong-Cheol;Lee, Chang-Hun
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.79-87
    • /
    • 2007
  • This paper aims to offer a solution based on semantic document classification to improve e-Government utilization and efficiency for people using their own information retrieval system and linguistic expression Generally, semantic document classification method is an approach that classifies documents based on the diverse relationships between keywords in a document without fully describing hierarchial concepts between keywords. Our approach considers the deep meanings within the context of the document and radically enhances the information retrieval performance. Concept Weight Document Classification(CoWDC) method, which goes beyond using exist ing keyword and simple thesaurus/ontology methods by fully considering the concept hierarchy of various concepts is proposed, experimented, and evaluated. With the recognition that in order to verify the superiority of the semantic retrieval technology through test results of the CoWDC and efficiently integrate it into the e-Government, creation of a thesaurus, management of the operating system, expansion of the knowledge base and improvements in search service and accuracy at the national level were needed.

  • PDF

SATS: Structure-Aware Touch-Based Scrolling

  • Kim, Dohyung;Gweon, Gahgene;Lee, Geehyuk
    • ETRI Journal
    • /
    • v.38 no.6
    • /
    • pp.1104-1113
    • /
    • 2016
  • Non-linear document navigation refers to the process of repeatedly reading a document at different levels to provide an overview, including selective reading to search for useful information within a document under time constraints. Currently, this function is not supported well by small-screen tablets. In this study, we propose the concept of structure-aware touch-based scrolling (SATS), which allows structural document navigation using region-dependent touch gestures for non-sequential navigation within tablets or tablet-sized e-book readers. In SATS, the screen is divided into four vertical sections representing the different structural levels of a document, where dragging into the different sections allows navigating from the macro to micro levels. The implementation of a prototype is presented, as well as details of a comparative evaluation using typical non-sequential navigation tasks performed under time constraints. The results showed that SATS obtained better performance, higher user satisfaction, and a lower usability workload compared with a conventional structural overview interface.

A Study on the Depth-Oriented Decomposition Indexing Method for Creating and Searching Structured Documents Based-on XML (XML을 이용한 구조적 문서 생성 및 탐색을 위한 깊이중심분할 색인기법에 관한 연구)

  • Yang, Ok-Yul;Lee, Yong-Ju
    • The KIPS Transactions:PartD
    • /
    • v.9D no.6
    • /
    • pp.1025-1042
    • /
    • 2002
  • The goal of this study is to generate a structured document which improves the performance of an information retrieval system by using thesaurus, information on relations between words (terms), and to study on the technique for searching this structured document. In order to accomplish this goal, we propose a DODI (Depth -Oriented Decomposition Index) technique for the structured document and an algorithm to search for related information efficient]y through this index technique that uses a thesaurus. We establish a storage system by which the structured document generated by this index technique is saved in a database through OpenXML and XML documents are generated through ForXML methods.

Noise Removal using Support Vector Regression in Noisy Document Images

  • Kim, Hee-Hoon;Kang, Seung-Hyo;Park, Jai-Hyun;Ha, Hyun-Ho;Lim, Dong-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.4
    • /
    • pp.669-680
    • /
    • 2012
  • Noise removal of document images is a necessary step during preprocessing to recognize characters effectively because it has influences greatly on processing speed and performance for character recognition. We have considered using the spatial filters such as traditional mean filters and Gaussian filters, and wavelet transformed based methods for noise deduction in natural images. However, these methods are not effective for the noise removal of document images. In this paper, we present noise removal of document images using support vector regression. The proposed approach consists of two steps which are SVR training step and SVR test step. We construct an optimal prediction model using grid search with cross-validation in SVR training step, and then apply it to noisy images to remove noises in test step. We evaluate our SVR based method both quantitatively and qualitatively for noise removal in Korean, English and Chinese character documents, and compare it to some existing methods. Experimental results indicate that the proposed method is more effective and can get satisfactory removal results.

Document Clustering Technique by K-means Algorithm and PCA (주성분 분석과 k 평균 알고리즘을 이용한 문서군집 방법)

  • Kim, Woosaeng;Kim, Sooyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.3
    • /
    • pp.625-630
    • /
    • 2014
  • The amount of information is increasing rapidly with the development of the internet and the computer. Since these enormous information is managed by the document forms, it is necessary to search and process them efficiently. The document clustering technique which clusters the related documents through the similarity between the documents help to classify, search, and process the large amount of documents automatically. This paper proposes a method to find the initial seed points through principal component analysis when the documents represented by vectors in the feature vector space are clustered by K-means algorithm in order to increase clustering performance. The experiment shows that our method has a better performance than the traditional K-means algorithm.