• Title/Summary/Keyword: Document research

Search Result 1,345, Processing Time 0.119 seconds

Document Image Binarization by GAN with Unpaired Data Training

  • Dang, Quang-Vinh;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • v.16 no.2
    • /
    • pp.8-18
    • /
    • 2020
  • Data is critical in deep learning but the scarcity of data often occurs in research, especially in the preparation of the paired training data. In this paper, document image binarization with unpaired data is studied by introducing adversarial learning, excluding the need for supervised or labeled datasets. However, the simple extension of the previous unpaired training to binarization inevitably leads to poor performance compared to paired data training. Thus, a new deep learning approach is proposed by introducing a multi-diversity of higher quality generated images. In this paper, a two-stage model is proposed that comprises the generative adversarial network (GAN) followed by the U-net network. In the first stage, the GAN uses the unpaired image data to create paired image data. With the second stage, the generated paired image data are passed through the U-net network for binarization. Thus, the trained U-net becomes the binarization model during the testing. The proposed model has been evaluated over the publicly available DIBCO dataset and it outperforms other techniques on unpaired training data. The paper shows the potential of using unpaired data for binarization, for the first time in the literature, which can be further improved to replace paired data training for binarization in the future.

Improving the Performance of a Fast Text Classifier with Document-side Feature Selection (문서측 자질선정을 이용한 고속 문서분류기의 성능향상에 관한 연구)

  • Lee, Jae-Yun
    • Journal of Information Management
    • /
    • v.36 no.4
    • /
    • pp.51-69
    • /
    • 2005
  • High-speed classification method becomes an important research issue in text categorization systems. A fast text categorization technique, named feature value voting, is introduced recently on the text categorization problems. But the classification accuracy of this technique is not good as its classification speed. We present a novel approach for feature selection, named document-side feature selection, and apply it to feature value voting method. In this approach, there is no feature selection process in learning phase; but realtime feature selection is executed in classification phase. Our results show that feature value voting with document-side feature selection can allow fast and accurate text classification system, which seems to be competitive in classification performance with Support Vector Machines, the state-of-the-art text categorization algorithms.

Clustering XML Documents Considering The Weight of Large Items in Clusters (클러스터의 주요항목 가중치 기반 XML 문서 클러스터링)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.1-8
    • /
    • 2007
  • As the web document of XML, an exchange language of data in the advanced Internet, is increasing, a target of information retrieval becomes the web documents. Therefore, there we researches on structure, integration and retrieval of XML documents. This paper proposes a clustering method of XML documents based on frequent structures, as a basic research to efficiently process query and retrieval. To do so, first, trees representing XML documents are decomposed and we extract frequent structures from them. Second, we perform clustering considering the weight of large items to adjust cluster creation and cluster cohesion, considering frequent structures as items of transactions. Third, we show the excellence of our method through some experiments which compare which the previous methods.

Automatic Reading System for On-off Type DNA Chip

  • Ryu, Mun-Ho;Kim, Jong-Dae;Kim, Jong-Won
    • Journal of Information Processing Systems
    • /
    • v.2 no.3 s.4
    • /
    • pp.189-193
    • /
    • 2006
  • In this study we propose an automatic reading system for diagnostic DNA chips. We define a general specification for an automatic reading system and propose a possible implementation method. The proposed system performs the whole reading process automatically without any user intervention, covering image acquisition, image analysis, and report generation. We applied the system for the automatic report generation of a commercialized DNA chip for cervical cancer detection. The fluorescence image of the hybridization result was acquired with a $GenePix^{TM}$ scanner using its library running in HTML pages. The processing of the acquired image and the report generation were executed by a component object module programmed with Microsoft Visual C++ 6.0. To generate the report document, we made an HWP 2002 document template with marker strings that were supposed to be searched and replaced with the corresponding information such as patient information and diagnosis results. The proposed system generates the report document by reading the template and changing the marker strings with the resultant contents. The system is expected to facilitate the usage of a diagnostic DNA chip for mass screening by the automation of a conventional manual reading process, shortening its processing time, and quantifying the reading criteria.

A Study on Problems of Certification System in International Electronic Commerce (전자무역(電子貿易)에서 제도상(制度上) 인증(認證)시스템의 문제점(問題點)에 관한 고찰(考察))

  • Oh, Hyon-Sok
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.23
    • /
    • pp.291-320
    • /
    • 2004
  • Electronic transaction using electronic documents be carried without direct person to person meeting, there is the possibility to use other's identity illegally without notice and to verity authenticity of transaction. It is very hard to find out that the electronic documents on the process of submitting is forged documents or not and also has much difficulty in maintaining transmitting secret. Therefore, to solve such problems on electronic transactions, certification system with cryptography skill are inevitably necessary. Also there is needed legal base in the electronic document as functional equivalent of the paper document. Recently there are so many commercial certification service provider(CPS) such as Identrus, Bolero, TEDI but their establishment of CPS, certification process, guideline and so on are different each CPS. Therefore, this kind of situation can make user confuse. To introduce and develop the electronic certification in the international electronic commerce not domestic electronic commerce, it need to authorize and operate certification authority under the uniform regulation base. But, because the laws and guidelines that related to electronic certification system are different among the nations and international organizations, it need to compare laws and guidelines. In conclusion, the most important thing to resolve problems surrounded certification and develope certification system in the international electronic commerce make uniform rule of international electronic certification to recognize internationally from each nation or at least, need to harmony laws and guideline in each nations.

  • PDF

A Study on Digitization of Sea Transport Document - Focusing on ESS-Databridge - (해상운송서류 전자화에 관한 소고 - ESS-Databridge를 중심으로 -)

  • LIM, Sung-Chul
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.65
    • /
    • pp.95-116
    • /
    • 2015
  • So far several attempts have been made to digitalizing sea transport documents. Three notable examples are SeaDocs, Bolero, e-B/L Korea and Ess-Databridge. Ess-Databridge was established in 2003, with the aim of promoting the use of electronic alternative to shipping documents. The ESS-Databridge system was piloted from 2005 and went live in January 2010. The ESS-Databridge operates under a private legal outline, the Databridge Services and Users Agreement (DSUA). In the Ess-Databridge system, only the user who is in control of the original bill of lading will be able to indorse it on to another user. Once the indorsement is effected and unless the indorsee decide store turn the documents, the indorser loses control and retains access only to an electronic document marked 'copy' for its records. A feature that appears to have been crucial to the success of the CargoDocs service is that visually, e-B/Ls produced using ESS-Databridge appear identical to the paper documents. The ESS-Databridge may be even more successful if the legislators take certain steps that will increase uniformity and certainty in electronic transport documentation.

  • PDF

Personal Electronic Document Retrieval System Using Semantic Web/Ontology Technologies (시멘틱 웹/온톨로지 기술을 이용한 개인용 전자문서 검색 시스템)

  • Kim, Hak-Lae;Kim, Hong-Gee
    • The Journal of Society for e-Business Studies
    • /
    • v.12 no.1
    • /
    • pp.135-149
    • /
    • 2007
  • There are many kinds of applications or software components to manage files in a local computer, but it is very difficult to organize personal documents in a consistent way and to search expected ones in a precise way. In this paper, we present our development of a document management and retrieval tool, which is named Ontalk. Our system provides a semi-automatic metadata generator and an ontology-based search engine for electronic documents. Ontalk can create and import various ontologies in RDFS or OWL for describing the metadata. Our system that is built upon.NET technology is easily communicated with or flexibly plugged into many different programs.

  • PDF

The Information value-based document management technique using the Information Lifecycle Management Theory (정보주기관리 이론에 근거한 정보가치 기반문서 관리기법)

  • Im Ji-Hoon;Lee Chil-Gee;Lee Young-Joong
    • Journal of the Korea Society for Simulation
    • /
    • v.14 no.4
    • /
    • pp.19-30
    • /
    • 2005
  • Due to explosive expansion in R & D efforts for advancement of technological predominance by Enterprises, the volume of technical information rapidly increases and emphasize on the valuation of this information has grown ever increasingly important. Therefore the requirement for systematic management and safeguard and accumulation of these intellectual properties of the Enterprise is in very high demand. A lot of effort and research has been carried out and many on going studies in progress to try to derive the optimum solution on how to manage information retention policy, processes, execution method, and hardware to execute the information with and etc. The intent of this thesis is to recommend a way for the Enterprise on how to evaluate the valuation of the data and to suggest the method on how to manage these intellectual properties by way of using Information Lifecycle Management theory which manages data according to the business valuation of the data. The decision on valuation of data and retention cycle is based on analytic method of a nonparametric regression, experimentation was carried out by applying to Enterprise Document Management System to present the suitable retention cycle according to the valuation and variety of attribute of data.

  • PDF

XQL Query Processing System using XML Views (XML뷰를 이용한 XQL질의처리 시스템)

  • 김천식;손기락
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.2
    • /
    • pp.129-140
    • /
    • 2002
  • XML has become a standard for exchanging document data on the web. Currently, most of the commercial data are stored in the relational database system. It requires to converse the document into XML form to transfer the data to a relational database system. The purpose of this paper is to research on a query processing system which will make it easy and convenient to raise a query and derive results from the document data stored in a relational database. We have designed a XML view called R2X(Relational To XML). With this R2X view, users can view a relational database as XML documents. Using R2X views, we design and implement a Query processing system which will make it expedient to form a query with a XML query language called XQL.

  • PDF

Design and implementation of an XML Repository System supporting Document Version (버전을 지원하는 XML 저장관리 시스템 설계 및 구현)

  • Son, Chung-Beom;Oh, Kyoung-Keun;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.13-22
    • /
    • 2003
  • Recently, as the Importance of the management on internet documents has highly increased, the research of an XML repository system has been actively made to store, retrieve and manage large XML documents. The version management for XML documents is required in the XML applications such as patent documents, software design and system manual that the modified documents have to be managed. In this paper, we propose a data model based on a fragmentation model that supports document versioning. We also design and implement an XML repository system supporting document versioning. It is shown through Performance evaluation that our system outperforms the existing repository system.