• Title/Summary/Keyword: document

Search Result 4,940, Processing Time 0.026 seconds

Investigation on the Effect of Multi-Vector Document Embedding for Interdisciplinary Knowledge Representation

  • Park, Jongin;Kim, Namgyu
    • Knowledge Management Research
    • /
    • v.21 no.1
    • /
    • pp.99-116
    • /
    • 2020
  • Text is the most widely used means of exchanging or expressing knowledge and information in the real world. Recently, researches on structuring unstructured text data for text analysis have been actively performed. One of the most representative document embedding method (i.e. doc2Vec) generates a single vector for each document using the whole corpus included in the document. This causes a limitation that the document vector is affected by not only core words but also other miscellaneous words. Additionally, the traditional document embedding algorithms map each document into only one vector. Therefore, it is not easy to represent a complex document with interdisciplinary subjects into a single vector properly by the traditional approach. In this paper, we introduce a multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. After introducing the previous study on multi-vector document embedding, we visually analyze the effects of the multi-vector document embedding method. Firstly, the new method vectorizes the document using only predefined keywords instead of the entire words. Secondly, the new method decomposes various subjects included in the document and generates multiple vectors for each document. The experiments for about three thousands of academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the multi-vector based method, we ascertained that the information and knowledge in complex documents can be represented more accurately by eliminating the interference among subjects.

Design of Document-HTML Generation Technique for Authorized Electronic Document Communication (공인전자문서 소통을 위한 Document-HTML 문서 생성 기법의 설계)

  • Hwang, Hyun-Cheon;Kim, Woo-Je
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.1
    • /
    • pp.51-59
    • /
    • 2021
  • Electronic document communication based on a digital channel is becoming increasingly important with the advent of the paperless age. The electronic document based on PDF format does not provide a powerful customer experience for a mobile device user despite replacing a paper document by providing the content integrity and the independence of various devices and software. On the other hand, the electronic document based on HTML5 format has weakness in the content integrity as there is no HTML5 specification for the content integrity despite its enhanced customer experience such as a responsive web technology for a mobile device user. In this paper, we design the Document-HTML, which provides the content integrity and the powerful customer experience by declaring the HTML5 constraint rules and the extended tags to contain the digital signature based on PKI. We analyze the existing electronic document that has been used in the major financial enterprise to develop a sample. We also verify the Document-HTML by experimenting with the sample of HTML electronic communication documents and analyze the PKI equation. The Document-HTML document can be used as an authorized electronic document communication and provide a powerful customer experience in the mobile environment between an enterprise and a user in the future.

A Study on Intelligent Document Processing Management using Unstructured Data (비정형 데이터를 활용한 지능형 문서 처리 관리에 관한 연구)

  • Kyoung Hoon Park;Kwang-Kyu Seo
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.2
    • /
    • pp.71-75
    • /
    • 2024
  • This research focuses on processing unstructured data efficiently, containing various formulas in document processing and management regarding the terms and rules of domestic insurance documents using text mining techniques. Through parsing and compilation technology, document context, content, constants, and variables are automatically separated, and errors are verified in order of the document and logic to improve document accuracy accordingly. Through document debugging technology, errors in the document are identified in real time. Furthermore, it is necessary to predict the changes that intelligent document processing will bring to document management work, in particular, the impact on documents and utilization tasks that are double managed due to various formulas and prepare necessary capabilities in the future.

  • PDF

An Archival Study on the Arrangement and Description of Old Document(Diploma) (고문서 정리(整理)에 대한 기록학적 연구 - 새로운 고문서 정리 방법의 모색을 위하여 -)

  • Cho, Kyung-Koo
    • The Korean Journal of Archival Studies
    • /
    • no.7
    • /
    • pp.37-74
    • /
    • 2003
  • An Old document(Diploma) is a historical and unique record, so it must be collected, arranged, and preserved for research as soon as possible. Especially, for the effective use of the Old Document(Diploma), it is needed to arrange and describe the material systematically on the ground of modern archival theory. The Kyujanggak Archives in the Seoul National University has published 23 volumes of Old document(Diploma) material Old Document(Diploma). But they seem to cause the readers inconvenience, because the materials are classified and gathered only by genre, the titles or the orders of the materials are not standardized, and there is no description about the content of each Old document(Diploma). Jangseo-gak Library in The Academy of Korean Studies has also published the series of Old document(Diploma) material Old Document(Diploma) Collection. However the case is not different, since they are all mixed up with materials classified and gathered by genre, family, academy, or local school. And a great part of the materials have no titles and no description about the content of each Old document(Diploma), either. About the arrangement and description of the records, European and American archival science has established the theory of l)the principle of provenance, 2)the principle of original order, 3)levels of control, 4)collective description. These theories are valuable for the effective use of Old document(Diploma). On the viewpoint of the principle of provenance, Old document(Diploma) materials should not be classified by subject and genre, but by family and person. Then, the Old document(Diploma) materials, after collected by the unit of family or person on the viewpoint of the principle of provenance, should be arranged in their original order for more detailed arrangement and furthermore, for the work to find their relationship. This is so called the principle of original order. The hierarchical management of the Old document(Diploma) materials, for example, classifying by record group, sub-group, series, item and so on, is the concept of the levels of control, and comprehensive description of the each hierarchical structure is the concept of the collective description. Let's apply these archival theories to 34 pieces of the Chung, Man-Seok's material in the series of Old document(Diploma) material Old Document(Diploma). First, collect the Old document(Diploma) materials into Chung, Man-Seok's collection(the principle of provenance), which were scattered in the series classified by genre. Secondly, rearrange them chronologically(the principle of original order), and then we can find the comprehensive information about Chung, Man-Seok. For the hierarchical management of the Old document(Diploma) materials, we should establish a few concepts from the general, large group to specific, small item. The concepts can be organized as following; l)record group(Chung, Man-Seok record group) - 2)sub-group(personnel document, property document, family document, social activity document, political activity document, etc) - 3)series(gyoji-series, gyoseo-series, yuji-series etc. in the personnel document) - 4)folder(document with additions) - 5)item(one document). According to the the theory of the collective description, in the level of record group, there should be a collective description of Chung, Man-Seok's biography or a summary of record group. Similarly, there should be a collective description of a summary of sub-group in the level of sub-group and a summary of series in the level of series.

A Study on Security System of Document Image using Mixing Algorithm (합성 방식을 이용한 문서 화상의 보안 체계 연구)

  • 허윤석;김일경;박일남
    • The Journal of Information Technology
    • /
    • v.2 no.2
    • /
    • pp.89-105
    • /
    • 1999
  • In this paper, we present a countermeasure for a various trouble occurred in secure communication of document image. We Propose a security system for transmission of document image using mixing algorithm that the third party cannot conceive secure transmission of information instead of existing scheme which depend on crypto-degree of security algorithm, itself. For this, RM, DM and RDM algorithm for mixing of secure bits are proposed and applied to digital signature for mixing for secure document and mixing for non-secure document by secure document. Security system for document image involves not only security scheme for document image transmission itself, but also digital signature scheme. The transmitter embeds secretly the signatures onto secure document, embeds it to non-secure document and transfers it to the receiver. The receiver makes a check of any forgery on the signature and the document. Because the total amount of transmitted data and the image quality are about the same to those of the original document image, respectively, the third party cannot notice the fact that signatures and secure document are embedded on the document image. Thus, the probability of attack will be reduced.

  • PDF

A Study on An Architecture of the Security improved Document DRM for preventing Information Leakage in Military Information System Environment (국방 정보시스템 환경에서 정보유출 방지를 위한 보안성이 강화된 문서 DRM 설계에 관한 연구)

  • Eom, Jung Ho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.1
    • /
    • pp.41-49
    • /
    • 2011
  • We designed a security improved document DRM for protecting document based military information which is transmitted in the military information system environment. The user should be could not access document which not related to his/her role and duty, and must view the only document appropriate for his/her role and security level according to the security level of document. We improved the security of document DRM by adding to the access control module in DRM server. Our system allows operation mode authorizations for the document, considering the user's role & security level and the security level of document. And it prevents indiscriminate access to the document and damage the confidentiality and integrity of information.

Document Image Binarization Technique using MSER (MSER을 이용한 문서 이미지 이진화 기법)

  • Yu, Young-Jung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.8
    • /
    • pp.1941-1947
    • /
    • 2014
  • Document image binarization is largely used as previous stage of document recognition. And the result of document recognition is much affected from the result of document image binarization. There were many studies to binarize document images. The results of previous studies for document image binarization is varied according to the state of document images. In this paper, we propose a technique for document image binarization using MSER that is applied to extract objects from an image. At first, raw MSER objects are extracted from a document image. Because the raw MSER objects cannot be used for document image binarization, the extracted raw MSER objects are modified. Then the final MSER objects are used for document image binarization with the contrast image that is extracted from the document image. Experimental results show that the proposed technique is useful for document image binarization.

A Study on eDocument Management Using Professional Terminologies (전문용어기반 eDocument 관리 방안에 관한 연구)

  • 김명옥
    • The Journal of Society for e-Business Studies
    • /
    • v.7 no.2
    • /
    • pp.21-38
    • /
    • 2002
  • Document retrieval (DR) has been a serious issue for long in the field of Office Information Management. Nowadays, our daily work is becoming heavily dependent on the usage of information collected from the internet, and the DR methods on the Web has become an important issue which is studied more than any other topic by many researchers. The main purpose of this study is to develop a model to manage business documents by integrating three major methodologies used in the field of electronic library and information retrieval: Metadata, Thesaurus, and Index/Reversed Index. In addition, we have added a new concept of eDocument, which consists of metadata about unit documents and/or unit document themselves. eDocument is introduced as a way to utilize existing document sources. The core concepts and structures of the model were introduced, and the architecture of the eDocument management system has been proposed. Test (simulation) result of the model and the direction for the future studies were also mentioned.

  • PDF

An Implementation Method of Improved Document DRM for Preventing Information Leakage using RBAC Approach (RBAC을 이용한 정보유출 방지를 위한 보안성이 강화된 문서 DRM 구현)

  • Choi, Young Hyun;Eom, Jung Ho;Chung, Tai Myoung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.4
    • /
    • pp.57-66
    • /
    • 2011
  • We implemented the document DRM applying role based access control(RBAC) mechanism for preventing the information leakage of a document which is transmitted in network environment. It must prevent to access document not related to user role and duty, and must allow operation to document for improving security, considering user role and security level according to a document importance. We improved the security of document DRM by adding to the access control module applying RBAC for satisfying security requirements. Though the user access document, our system allows operation authorizations to document by the user's role & security level and the security attribute of RBAC. Our system prevents indiscriminate access to the documents by user who is not associated with the role, and prevents damage the confidentiality and integrity.

A study on development of XML-based electronic technical document in construction project (건설분야에서의 XML 기반의 전자(電子) 기술문서 개발 방안)

  • Jeong Seong-Yun
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • autumn
    • /
    • pp.573-576
    • /
    • 2003
  • In these days. a lot of paper are made electronic document using the computer through the spread of computer and internet technology growing up. which is produced in construction project such as form document, report, account, specification and so on. Developed electronic document is distributed energetically through internet. Because existing electronic document is composed of various format, it is difficult that document data is shared and exchanged among the project participating groups in internet. So, it is required that construction-related technical document is made electronic document by standard information system and how to use technical document. Therefore, tins study presents the procedure and method of developing electronic document on technical document using XML technology.

  • PDF