• Title/Summary/Keyword: Document Order

Search Result 777, Processing Time 0.028 seconds

A Feature -Based Word Spotting for Content-Based Retrieval of Machine-Printed English Document Images (내용기반의 인쇄체 영문 문서 영상 검색을 위한 특징 기반 단어 검색)

  • Jeong, Gyu-Sik;Gwon, Hui-Ung
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1204-1218
    • /
    • 1999
  • 문서영상 검색을 위한 디지털도서관의 대부분은 논문제목과/또는 논문요약으로부터 만들어진 색인에 근거한 제한적인 검색기능을 제공하고 있다. 본 논문에서는 영문 문서영상전체에 대한 검색을 위한 단어 영상 형태 특징기반의 단어검색시스템을 제안한다. 본 논문에서는 검색의 효율성과 정확도를 높이기 위해 1) 기존의 단어검색시스템에서 사용된 특징들을 조합하여 사용하며, 2) 특징의 개수 및 위치뿐만 아니라 특징들의 순서를 포함하여 매칭하는 방법을 사용하며, 3) 특징비교에 의해 검색결과를 얻은 후에 여과목적으로 문자인식을 부분적으로 적용하는 2단계의 검색방법을 사용한다. 제안된 시스템의 동작은 다음과 같다. 문서 영상이 주어지면, 문서 영상 구조가 분석되고 단어 영역들의 조합으로 분할된다. 단어 영상의 특징들이 추출되어 저장된다. 사용자의 텍스트 질의가 주어지면 이에 대응되는 단어 영상이 만들어지며 이로부터 영상특징이 추출된다. 이 참조 특징과 저장된 특징들과 비교하여 유사한 단어를 검색하게 된다. 제안된 시스템은 IBM-PC를 이용한 웹 환경에서 구축되었으며, 영문 문서영상을 이용하여 실험이 수행되었다. 실험결과는 본 논문에서 제안하는 방법들의 유효성을 보여주고 있다. Abstract Most existing digital libraries for document image retrieval provide a limited retrieval service due to their indexing from document titles and/or the content of document abstracts. This paper proposes a word spotting system for full English document image retrieval based on word image shape features. In order to improve not only the efficiency but also the precision of a retrieval system, we develop the system by 1) using a combination of the holistic features which have been used in the existing word spotting systems, 2) performing image matching by comparing the order of features in a word in addition to the number of features and their positions, and 3) adopting 2 stage retrieval strategies by obtaining retrieval results by image feature matching and applying OCR(Optical Charater Recognition) partly to the results for filtering purpose. The proposed system operates as follows: given a document image, its structure is analyzed and is segmented into a set of word regions. Then, word shape features are extracted and stored. Given a user's query with text, features are extracted after its corresponding word image is generated. This reference model is compared with the stored features to find out similar words. The proposed system is implemented with IBM-PC in a web environment and its experiments are performed with English document images. Experimental results show the effectiveness of the proposed methods.

A Study on the Document viewer optimized for VR environment (VR 환경에 최적화 된 문서 뷰어에 관한 연구)

  • Joo, Yong-Ho;Kim, Sang-Mok;Cho, Ok-Hue
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.5
    • /
    • pp.139-145
    • /
    • 2021
  • Through this study, we intend to study user satisfaction in order to verify whether there is a need for full-scale research, development and commercialization of document viewers in a VR environment. VR content consists of realistic 3D graphics and 360-degree video, and provides a synesthesia experience and immersion. We developed and tested a VR document viewer prototype that can utilize this concept as a document viewing system. It can act as a viewer that provides an interactive viewing environment according to the user's body interaction and the direction of the field of view, and it can be said that the feature of VR document viewer is that it can draw the user's high level of immersion and concentration when using the viewer. The developed prototype was tested in a test group consisting of 100 VR experiences and device owners for about 1 hour and 3 days a day, and then a questionnaire survey in the form of a fixed selection question was conducted. This study is a prototype study of a document viewer suitable for a virtual reality environment, and can lead to a sense of immersion when reading a document, and suggest a new document viewer direction that is effective for visual fatigue and visual perception of the document.

Web Document Clustering based on Graph using Hyperlinks (하이퍼링크를 이용한 그래프 기반의 웹 문서 클러스터링)

  • Lee, Joon;Kang, Jin-Beom;Choi, Joong-Min
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.590-595
    • /
    • 2009
  • With respect to the exponential increment of web documents on the internet, it is important how to improve performance of clustering method for web documents. Web document clustering techniques can offer accurate information and fast information retrieval by clustering web documents through semantic relationship. The clustering method based on mesh-graph provides high recall by calculating similarity for documents, but it requires high computation cost. This paper proposes a clustering method using hyperlinks which is structural feature of web documents in order to keep effectiveness and reduce computation cost.

  • PDF

A Study on Constructing Approach of Enterprise Document Management Architecture in Semiconductor Business (반도체 산업에서의 Enterprise Document Management Architecture 구현에 관한 연구)

  • 장현성;이영중;송하석;한영준;안정삼
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.11-14
    • /
    • 2001
  • A systematic construction and re-use of technology related to the product development and production has been the most important for the semiconductor industry dependent on process and equipment. Therefore, numerous outputs in the form of paper has been produced in the process of information management ranging from the creation to recycling and disposal of technologies. In this research, the technology and documents necessary for the business management in the field of semiconductor manufacturing were classified in an effort to solve problems while the modeling of document management architecture at the enterprise level was performed by properly setting up the security system to prevent the unauthorized disclosure of the product development technology to the third parties. Especially, the product and process specification are designed in such a way as to ensure a real-time response in interface with the production system in order to shorten the development lead-time and improve the productivity. This paper is to discuss the modeling approach, the strategy to construct the system and its results.

  • PDF

A Study on Plagiarism Detection and Document Classification Using Association Analysis (연관분석을 이용한 효과적인 표절검사 및 문서분류에 관한 연구)

  • Hwang, Insoo
    • The Journal of Information Systems
    • /
    • v.23 no.3
    • /
    • pp.127-142
    • /
    • 2014
  • Plagiarism occurs when the content is copied without permission or citation, and the problem of plagiarism has rapidly increased because of the digital era of resources available on the World Wide Web. An important task in plagiarism detection is measuring and determining similar text portions between a given pair of documents. One of the main difficulties of this task is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to handle this problem, this paper proposed association analysis in data mining to detect plagiarism. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of plagiarized text. Experimental results employing an unsupervised document classification strategy showed that the proposed method outperformed traditionally used approaches.

A Study on Knowledge Media System using the Concept of Special Document (전문 문서 개념을 사용한 지식 미디어 시스템에 관한 연구)

  • 손영수
    • Journal of the Korean Society of Manufacturing Technology Engineers
    • /
    • v.5 no.4
    • /
    • pp.63-73
    • /
    • 1996
  • The knowledge of the specialized fields has been changed rapidly in the both side of quantity and quality. A hyper media as a knowledge based system is so fixed in the linked media in mutual that we couldn't tell it information provision with the view of users. In this paper, we propose the way of offering intellectual and flexible information which is the same with demand in the side of user, selecting, searching and composing the hypertext at the point of user's view through the design of knowledge media system. Three concepts are used in order to challenge the knowledge media system : special document, agent system and ontology. The special document is a ensure that its activities are coordinated with those of the others within the community, providing a uniform control mechanism. Finally, ontology is a language for exchanging knowledgement, which is a message exchanged among agents to ensure the proper interaction among them. The combination of these three concepts is used to design the prototype of knowledge media system.

  • PDF

A Database Approach for Modeling and Querying XML Documents

  • Panseop Shin;Kim, Jeong-Eun;Lee, Jaeho;Haechull Lim
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.703-706
    • /
    • 2000
  • In recent years. XML applications are being developed in diverse area. Especially, development of XML document repository system associated with database is carrying out widely. The previous researches of XML repository system have several defects which are update and retrieval limitations for the XML document, design limitation for a formal retrieval algorithm and data redundancy. In order to solve the above problems. in this paper, we suggest relational database schemes for overcoming limitations of updating, retrieving, and rebuilding document. And suggest query translation strategy using two-phase translation that consists of pattern analyzing phase and SQL generating phase.

  • PDF

An Effective Increment리 Content Clustering Method for the Large Documents in U-learning Environment (U-learning 환경의 대용량 학습문서 판리를 위한 효율적인 점진적 문서)

  • Joo, Kil-Hong;Choi, Jin-Tak
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.9
    • /
    • pp.859-872
    • /
    • 2004
  • With the rapid advance of computer and communication techonology, the recent trend of education environment is edveloping in the ubiquitous learning (u-learning) direction that learners select and organize the contents, time and order of learning by themselves. Since the amount of education information through the internet is increasing rapidly and it is managed in document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increased accuracy of search. This paper proposes an efficient incremental clustering method for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed.

  • PDF

Weighted Bayesian Automatic Document Categorization Based on Association Word Knowledge Base by Apriori Algorithm (Apriori알고리즘에 의한 연관 단어 지식 베이스에 기반한 가중치가 부여된 베이지만 자동 문서 분류)

  • 고수정;이정현
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.2
    • /
    • pp.171-181
    • /
    • 2001
  • The previous Bayesian document categorization method has problems that it requires a lot of time and effort in word clustering and it hardly reflects the semantic information between words. In this paper, we propose a weighted Bayesian document categorizing method based on association word knowledge base acquired by mining technique. The proposed method constructs weighted association word knowledge base using documents in training set. Then, classifier using Bayesian probability categorizes documents based on the constructed association word knowledge base. In order to evaluate performance of the proposed method, we compare our experimental results with those of weighted Bayesian document categorizing method using vocabulary dictionary by mutual information, weighted Bayesian document categorizing method, and simple Bayesian document categorizing method. The experimental result shows that weighted Bayesian categorizing method using association word knowledge base has improved performance 0.87% and 2.77% and 5.09% over weighted Bayesian categorizing method using vocabulary dictionary by mutual information and weighted Bayesian method and simple Bayesian method, respectively.

  • PDF

Design and Implementation of a Retrieval Server for Virtual Documents in the MIRAGE-III Digital Library (MIRAGE-III 디지털도서관에서 가상문서 검색 서버의 설계 및 구현)

  • Lee, Yong-Bae;Maeng, Sung-Hyon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.2
    • /
    • pp.219-230
    • /
    • 2002
  • One of the most important functions digital libraries need to offer is to help users find necessary information in a distributed environment in the most efficient and effective manner. In order to meet the goal, it is desirable to link scattered pieces of information and present them as a logically coherent whole when the user wants it, so that he or she doesn't need to know their physical location. The virtual document is an integrated document that the total or part of the physical documents stored in a specific repository are linked dynamically. Our MIRAGE-III digital library system provides a content-based retrieval of physical documents and the virtual documents in XML. This system provides a retrieval of partial documents, attributes and hierarchical structures and linked-documents based in structured documents like XML or SGML. In this paper we describe a methodology of design and implementation of the query processor and retrieval server in the MIRAGE-III digital library system.