• Title/Summary/Keyword: Document research

Search Result 1,342, Processing Time 0.026 seconds

Investigation on the Effect of Multi-Vector Document Embedding for Interdisciplinary Knowledge Representation

  • Park, Jongin;Kim, Namgyu
    • Knowledge Management Research
    • /
    • v.21 no.1
    • /
    • pp.99-116
    • /
    • 2020
  • Text is the most widely used means of exchanging or expressing knowledge and information in the real world. Recently, researches on structuring unstructured text data for text analysis have been actively performed. One of the most representative document embedding method (i.e. doc2Vec) generates a single vector for each document using the whole corpus included in the document. This causes a limitation that the document vector is affected by not only core words but also other miscellaneous words. Additionally, the traditional document embedding algorithms map each document into only one vector. Therefore, it is not easy to represent a complex document with interdisciplinary subjects into a single vector properly by the traditional approach. In this paper, we introduce a multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. After introducing the previous study on multi-vector document embedding, we visually analyze the effects of the multi-vector document embedding method. Firstly, the new method vectorizes the document using only predefined keywords instead of the entire words. Secondly, the new method decomposes various subjects included in the document and generates multiple vectors for each document. The experiments for about three thousands of academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the multi-vector based method, we ascertained that the information and knowledge in complex documents can be represented more accurately by eliminating the interference among subjects.

The Design and Implementation of SGML Document Editing System Using Document Structure Information (문서 구조정보를 이용한 SGML 문서 편집 시스템의 설계 및 구현)

  • Kim, Chang-Su;Jo, In-June;Jung, Hoe-Kyung
    • The Journal of Engineering Research
    • /
    • v.3 no.1
    • /
    • pp.21-27
    • /
    • 1998
  • This paper describes the design and implementation of system for editing SGML document instance using document structure information of SGML DTD. For make use of structure window for logical structure expression of document to SGML document editing without editing mistake of user and easy update the using support to editing process of elements, attributes, entities tools and product document, and valid using SGML parser. Also, in order to support Korean and English text using KS 5601. In this paper, the proposed SGML document editing system is used common controls support of window 95 for window user interface

  • PDF

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents (비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현)

  • Jo, Dae Woong;Choi, Ji Woong;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.23-33
    • /
    • 2014
  • The development of the information retrieval field is evolving to the research field searching accurately for the information from thing finding rapidly a large amount of information. Personalization and the semantic web technology is a key technology. The automatic indexing technology about the web document and throughput go beyond the research stage and show up as the practical service. However, there is a lack of research on the document information retrieval field about the attached document type of except the web document. In this paper, we illustrate about the method in which it analyzed the text content of the unstructured documents prepared in the text, word, hwp form and it how to construction OWL ontology. To build TBox of the document ontology and the resources which can be obtained from the document is selected, and we implement with the system in order to utilize as the instant of the constructed document ontology. It is effectually usable in the information retrieval and document management system using the semantic technology of the correspondence document as the ontology automatic construction of this kind of the unstructured documents.

Data Model for Document Exchange of Construction Projects (건설 프로젝트 문서교환을 위한 데이터모델)

  • An Sun-Ju;Son Bo-Sik;Lee Hyun-Soo
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • autumn
    • /
    • pp.569-572
    • /
    • 2003
  • A construction process involves many designers, engineers, contractors, consulting engineers and government officials. Thus, it is essential to promote collaboration among such participants through of effective document exchange. There have been efforts to Improve efficiency of document exchange through Web. Also XML/EDI is recommended by its method. So the purpose of the research was to establish the data model for document information management in document exchange for construction participants using web-based XML/EDI. This research proposed a method of modeling document information for systemic management of document information that exchanged by XML/EDI in central and explained concept of application document information that stored in database. This research classified construction document according to its information relation.

  • PDF

SATS: Structure-Aware Touch-Based Scrolling

  • Kim, Dohyung;Gweon, Gahgene;Lee, Geehyuk
    • ETRI Journal
    • /
    • v.38 no.6
    • /
    • pp.1104-1113
    • /
    • 2016
  • Non-linear document navigation refers to the process of repeatedly reading a document at different levels to provide an overview, including selective reading to search for useful information within a document under time constraints. Currently, this function is not supported well by small-screen tablets. In this study, we propose the concept of structure-aware touch-based scrolling (SATS), which allows structural document navigation using region-dependent touch gestures for non-sequential navigation within tablets or tablet-sized e-book readers. In SATS, the screen is divided into four vertical sections representing the different structural levels of a document, where dragging into the different sections allows navigating from the macro to micro levels. The implementation of a prototype is presented, as well as details of a comparative evaluation using typical non-sequential navigation tasks performed under time constraints. The results showed that SATS obtained better performance, higher user satisfaction, and a lower usability workload compared with a conventional structural overview interface.

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

  • Yoon, Yeo-Chan;Lee, Junwoo;Park, So-Young;Lee, Changki
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.443-454
    • /
    • 2017
  • In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

Design and Implementation of the Document HTML System for Preserving Content Integrity

  • Hyun Cheon Hwang;Ji Su Park;Jin Gon Shon
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.334-346
    • /
    • 2023
  • An electronic document based on PDF has been widely used in customer communication between an enterprise and a customer to deliver personalized content. However, electronic documents based on PDF in the form of paper layouts are not suitable for mobile environments because of low readability and lack of interactive interaction. Even though HTML is an essential language in a mobile environment, electronic document based on PDF is still used as it has a content integrity verification feature with a digital signature. It means that a user is sacrificing user experience in a mobile environment for content integrity and using paper-layout electronic documents. In this research, we design the Document HTML specification by setting the Document HTML conformance, adding the extended meta tags, and signing the message digest with a digital signature based on public key infrastructure (PKI). Furthermore, we implemented the Document HTML system, which has REST API services to generate and verify the Document HTML, and did experimental verification of the theory. As a result, we have confirmed that the Document HTML has both content integrity and user experience on mobile. Furthermore, the Document HTML is expected to be an alternative document format to deliver personalized content from an enterprise to a customer in a mobile environment instead of the paper layout electronic document such as PDF.

Enhancing Document Security with Computer Generated Hologram Encryption: Comprehensive Solution for Mobile Verification and Offline Decryption

  • Leehwan Hwang;Seunghyun Lee;Jongsung Choi
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.169-175
    • /
    • 2024
  • In this paper, we introduce a novel approach to enhance document security by integrating Computer Generated Hologram(CGH) encryption technology with a system for document encryption, printing, and subsequent verification using a smartphone application. The proposed system enables the encryption of documents using CGH technology and their printing on the edges of the document, simplifying document verification and validation through a smartphone application. Furthermore, the system leverages high-resolution smartphone cameras to perform online verification of the original document and supports offline document decryption, ensuring tamper detection even in environments without internet connectivity. This research contributes to the development of a comprehensive and versatile solution for document security and integrity, with applications in various domains.

An Archival Study on the Arrangement and Description of Old Document(Diploma) (고문서 정리(整理)에 대한 기록학적 연구 - 새로운 고문서 정리 방법의 모색을 위하여 -)

  • Cho, Kyung-Koo
    • The Korean Journal of Archival Studies
    • /
    • no.7
    • /
    • pp.37-74
    • /
    • 2003
  • An Old document(Diploma) is a historical and unique record, so it must be collected, arranged, and preserved for research as soon as possible. Especially, for the effective use of the Old Document(Diploma), it is needed to arrange and describe the material systematically on the ground of modern archival theory. The Kyujanggak Archives in the Seoul National University has published 23 volumes of Old document(Diploma) material Old Document(Diploma). But they seem to cause the readers inconvenience, because the materials are classified and gathered only by genre, the titles or the orders of the materials are not standardized, and there is no description about the content of each Old document(Diploma). Jangseo-gak Library in The Academy of Korean Studies has also published the series of Old document(Diploma) material Old Document(Diploma) Collection. However the case is not different, since they are all mixed up with materials classified and gathered by genre, family, academy, or local school. And a great part of the materials have no titles and no description about the content of each Old document(Diploma), either. About the arrangement and description of the records, European and American archival science has established the theory of l)the principle of provenance, 2)the principle of original order, 3)levels of control, 4)collective description. These theories are valuable for the effective use of Old document(Diploma). On the viewpoint of the principle of provenance, Old document(Diploma) materials should not be classified by subject and genre, but by family and person. Then, the Old document(Diploma) materials, after collected by the unit of family or person on the viewpoint of the principle of provenance, should be arranged in their original order for more detailed arrangement and furthermore, for the work to find their relationship. This is so called the principle of original order. The hierarchical management of the Old document(Diploma) materials, for example, classifying by record group, sub-group, series, item and so on, is the concept of the levels of control, and comprehensive description of the each hierarchical structure is the concept of the collective description. Let's apply these archival theories to 34 pieces of the Chung, Man-Seok's material in the series of Old document(Diploma) material Old Document(Diploma). First, collect the Old document(Diploma) materials into Chung, Man-Seok's collection(the principle of provenance), which were scattered in the series classified by genre. Secondly, rearrange them chronologically(the principle of original order), and then we can find the comprehensive information about Chung, Man-Seok. For the hierarchical management of the Old document(Diploma) materials, we should establish a few concepts from the general, large group to specific, small item. The concepts can be organized as following; l)record group(Chung, Man-Seok record group) - 2)sub-group(personnel document, property document, family document, social activity document, political activity document, etc) - 3)series(gyoji-series, gyoseo-series, yuji-series etc. in the personnel document) - 4)folder(document with additions) - 5)item(one document). According to the the theory of the collective description, in the level of record group, there should be a collective description of Chung, Man-Seok's biography or a summary of record group. Similarly, there should be a collective description of a summary of sub-group in the level of sub-group and a summary of series in the level of series.

History Document Image Background Noise and Removal Methods

  • Ganchimeg, Ganbold
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.5 no.2
    • /
    • pp.11-24
    • /
    • 2015
  • It is common for archive libraries to provide public access to historical and ancient document image collections. It is common for such document images to require specialized processing in order to remove background noise and become more legible. Document images may be contaminated with noise during transmission, scanning or conversion to digital form. We can categorize noises by identifying their features and can search for similar patterns in a document image to choose appropriate methods for their removal. In this paper, we propose a hybrid binarization approach for improving the quality of old documents using a combination of global and local thresholding. This article also reviews noises that might appear in scanned document images and discusses some noise removal methods.