• 제목/요약/키워드: document

검색결과 4,940건 처리시간 0.023초

Investigation on the Effect of Multi-Vector Document Embedding for Interdisciplinary Knowledge Representation

  • 박종인;김남규
    • 지식경영연구
    • /
    • 제21권1호
    • /
    • pp.99-116
    • /
    • 2020
  • Text is the most widely used means of exchanging or expressing knowledge and information in the real world. Recently, researches on structuring unstructured text data for text analysis have been actively performed. One of the most representative document embedding method (i.e. doc2Vec) generates a single vector for each document using the whole corpus included in the document. This causes a limitation that the document vector is affected by not only core words but also other miscellaneous words. Additionally, the traditional document embedding algorithms map each document into only one vector. Therefore, it is not easy to represent a complex document with interdisciplinary subjects into a single vector properly by the traditional approach. In this paper, we introduce a multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. After introducing the previous study on multi-vector document embedding, we visually analyze the effects of the multi-vector document embedding method. Firstly, the new method vectorizes the document using only predefined keywords instead of the entire words. Secondly, the new method decomposes various subjects included in the document and generates multiple vectors for each document. The experiments for about three thousands of academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the multi-vector based method, we ascertained that the information and knowledge in complex documents can be represented more accurately by eliminating the interference among subjects.

공인전자문서 소통을 위한 Document-HTML 문서 생성 기법의 설계 (Design of Document-HTML Generation Technique for Authorized Electronic Document Communication)

  • 황현천;김우제
    • 산업경영시스템학회지
    • /
    • 제44권1호
    • /
    • pp.51-59
    • /
    • 2021
  • Electronic document communication based on a digital channel is becoming increasingly important with the advent of the paperless age. The electronic document based on PDF format does not provide a powerful customer experience for a mobile device user despite replacing a paper document by providing the content integrity and the independence of various devices and software. On the other hand, the electronic document based on HTML5 format has weakness in the content integrity as there is no HTML5 specification for the content integrity despite its enhanced customer experience such as a responsive web technology for a mobile device user. In this paper, we design the Document-HTML, which provides the content integrity and the powerful customer experience by declaring the HTML5 constraint rules and the extended tags to contain the digital signature based on PKI. We analyze the existing electronic document that has been used in the major financial enterprise to develop a sample. We also verify the Document-HTML by experimenting with the sample of HTML electronic communication documents and analyze the PKI equation. The Document-HTML document can be used as an authorized electronic document communication and provide a powerful customer experience in the mobile environment between an enterprise and a user in the future.

비정형 데이터를 활용한 지능형 문서 처리 관리에 관한 연구 (A Study on Intelligent Document Processing Management using Unstructured Data)

  • 박경훈;서광규
    • 반도체디스플레이기술학회지
    • /
    • 제23권2호
    • /
    • pp.71-75
    • /
    • 2024
  • This research focuses on processing unstructured data efficiently, containing various formulas in document processing and management regarding the terms and rules of domestic insurance documents using text mining techniques. Through parsing and compilation technology, document context, content, constants, and variables are automatically separated, and errors are verified in order of the document and logic to improve document accuracy accordingly. Through document debugging technology, errors in the document are identified in real time. Furthermore, it is necessary to predict the changes that intelligent document processing will bring to document management work, in particular, the impact on documents and utilization tasks that are double managed due to various formulas and prepare necessary capabilities in the future.

  • PDF

고문서 정리(整理)에 대한 기록학적 연구 - 새로운 고문서 정리 방법의 모색을 위하여 - (An Archival Study on the Arrangement and Description of Old Document(Diploma))

  • 조경구
    • 기록학연구
    • /
    • 제7호
    • /
    • pp.37-74
    • /
    • 2003
  • An Old document(Diploma) is a historical and unique record, so it must be collected, arranged, and preserved for research as soon as possible. Especially, for the effective use of the Old Document(Diploma), it is needed to arrange and describe the material systematically on the ground of modern archival theory. The Kyujanggak Archives in the Seoul National University has published 23 volumes of Old document(Diploma) material Old Document(Diploma). But they seem to cause the readers inconvenience, because the materials are classified and gathered only by genre, the titles or the orders of the materials are not standardized, and there is no description about the content of each Old document(Diploma). Jangseo-gak Library in The Academy of Korean Studies has also published the series of Old document(Diploma) material Old Document(Diploma) Collection. However the case is not different, since they are all mixed up with materials classified and gathered by genre, family, academy, or local school. And a great part of the materials have no titles and no description about the content of each Old document(Diploma), either. About the arrangement and description of the records, European and American archival science has established the theory of l)the principle of provenance, 2)the principle of original order, 3)levels of control, 4)collective description. These theories are valuable for the effective use of Old document(Diploma). On the viewpoint of the principle of provenance, Old document(Diploma) materials should not be classified by subject and genre, but by family and person. Then, the Old document(Diploma) materials, after collected by the unit of family or person on the viewpoint of the principle of provenance, should be arranged in their original order for more detailed arrangement and furthermore, for the work to find their relationship. This is so called the principle of original order. The hierarchical management of the Old document(Diploma) materials, for example, classifying by record group, sub-group, series, item and so on, is the concept of the levels of control, and comprehensive description of the each hierarchical structure is the concept of the collective description. Let's apply these archival theories to 34 pieces of the Chung, Man-Seok's material in the series of Old document(Diploma) material Old Document(Diploma). First, collect the Old document(Diploma) materials into Chung, Man-Seok's collection(the principle of provenance), which were scattered in the series classified by genre. Secondly, rearrange them chronologically(the principle of original order), and then we can find the comprehensive information about Chung, Man-Seok. For the hierarchical management of the Old document(Diploma) materials, we should establish a few concepts from the general, large group to specific, small item. The concepts can be organized as following; l)record group(Chung, Man-Seok record group) - 2)sub-group(personnel document, property document, family document, social activity document, political activity document, etc) - 3)series(gyoji-series, gyoseo-series, yuji-series etc. in the personnel document) - 4)folder(document with additions) - 5)item(one document). According to the the theory of the collective description, in the level of record group, there should be a collective description of Chung, Man-Seok's biography or a summary of record group. Similarly, there should be a collective description of a summary of sub-group in the level of sub-group and a summary of series in the level of series.

합성 방식을 이용한 문서 화상의 보안 체계 연구 (A Study on Security System of Document Image using Mixing Algorithm)

  • 허윤석;김일경;박일남
    • 정보학연구
    • /
    • 제2권2호
    • /
    • pp.89-105
    • /
    • 1999
  • 본 논문은 컴퓨터를 이용한 문서 화상의 보안 통신에서 요구되는 문서 화상의 보안, 위조, 인증 등의 제반 분쟁에 대처하기 위한 연구이다. 문서 화상의 보안을 위해서 기존에 연구되어온 각종 암호화 방식 및 스크램블 방식과 같이 정보의 보안 여부를 노출시키고 비도에 의지하는 방식과 달리 정보의 보안여부를 제3자가 판독하기 어렵도록 하여 일상의 문서 교환으로 인식하게 함으로써 1차적으로 이의 해독에 따른 위험을 감소시키고 2차적으로는 해독이 가해진다 하여도 알고리즘 자체의 비도에 의해 해독을 용이하지 않도록 하는 방식의 보안 체계를 제안한다.

  • PDF

국방 정보시스템 환경에서 정보유출 방지를 위한 보안성이 강화된 문서 DRM 설계에 관한 연구 (A Study on An Architecture of the Security improved Document DRM for preventing Information Leakage in Military Information System Environment)

  • 엄정호
    • 디지털산업정보학회논문지
    • /
    • 제7권1호
    • /
    • pp.41-49
    • /
    • 2011
  • We designed a security improved document DRM for protecting document based military information which is transmitted in the military information system environment. The user should be could not access document which not related to his/her role and duty, and must view the only document appropriate for his/her role and security level according to the security level of document. We improved the security of document DRM by adding to the access control module in DRM server. Our system allows operation mode authorizations for the document, considering the user's role & security level and the security level of document. And it prevents indiscriminate access to the document and damage the confidentiality and integrity of information.

MSER을 이용한 문서 이미지 이진화 기법 (Document Image Binarization Technique using MSER)

  • 유영중
    • 한국정보통신학회논문지
    • /
    • 제18권8호
    • /
    • pp.1941-1947
    • /
    • 2014
  • 문서 이미지의 이진화는 문서 인식의 이전 단계에서 주로 사용되며, 이진화의 성공 여부에 따라 문서 인식의 결과에 영향을 미치는 중요한 단계로 볼 수 있다. 지금까지 문서 이미지를 이진화 하기 위한 다양한 기법들이 연구되었지만, 문서 이미지의 상태에 따라 그 결과는 다양하다. 본 논문에서는 객체 추출에 많이 이용되는 MSER(Maximally Stable Extremal Region)을 이용하여 문서 이미지를 이진화하는 기법을 제안한다. 먼저 문서 이미지에서 MSER 객체를 추출한다. 추출된 MSER 객체는 그 자체로 문서 이미지 이진화에 사용되기는 어렵기 때문에 사용하기 적합한 형태로 변경되는 과정을 거친다. 그리고 최종 MSER 객체와 문서 이미지로부터 추출한 대비 이진 이미지를 이용하여 최종 이진 이미지를 계산한다. 실험결과는 본 논문에서 제안한 방법이 문서 이미지의 이진화에 유용함을 보여준다.

전문용어기반 eDocument 관리 방안에 관한 연구 (A Study on eDocument Management Using Professional Terminologies)

  • 김명옥
    • 한국전자거래학회지
    • /
    • 제7권2호
    • /
    • pp.21-38
    • /
    • 2002
  • Document retrieval (DR) has been a serious issue for long in the field of Office Information Management. Nowadays, our daily work is becoming heavily dependent on the usage of information collected from the internet, and the DR methods on the Web has become an important issue which is studied more than any other topic by many researchers. The main purpose of this study is to develop a model to manage business documents by integrating three major methodologies used in the field of electronic library and information retrieval: Metadata, Thesaurus, and Index/Reversed Index. In addition, we have added a new concept of eDocument, which consists of metadata about unit documents and/or unit document themselves. eDocument is introduced as a way to utilize existing document sources. The core concepts and structures of the model were introduced, and the architecture of the eDocument management system has been proposed. Test (simulation) result of the model and the direction for the future studies were also mentioned.

  • PDF

RBAC을 이용한 정보유출 방지를 위한 보안성이 강화된 문서 DRM 구현 (An Implementation Method of Improved Document DRM for Preventing Information Leakage using RBAC Approach)

  • 최영현;엄정호;정태명
    • 디지털산업정보학회논문지
    • /
    • 제7권4호
    • /
    • pp.57-66
    • /
    • 2011
  • We implemented the document DRM applying role based access control(RBAC) mechanism for preventing the information leakage of a document which is transmitted in network environment. It must prevent to access document not related to user role and duty, and must allow operation to document for improving security, considering user role and security level according to a document importance. We improved the security of document DRM by adding to the access control module applying RBAC for satisfying security requirements. Though the user access document, our system allows operation authorizations to document by the user's role & security level and the security attribute of RBAC. Our system prevents indiscriminate access to the documents by user who is not associated with the role, and prevents damage the confidentiality and integrity.

건설분야에서의 XML 기반의 전자(電子) 기술문서 개발 방안 (A study on development of XML-based electronic technical document in construction project)

  • 정성윤
    • 한국건설관리학회:학술대회논문집
    • /
    • 한국건설관리학회 2003년도 학술대회지
    • /
    • pp.573-576
    • /
    • 2003
  • 최근 들어 컴퓨터의 보급과 인터넷 기술의 발전으로 건설사업을 수행하면서 발생하는 서식문서, 보고서, 계산서, 시방서 등 각종 문서를 컴퓨터를 사용하여 작성하고 작성된 전자문서는 인터넷을 통해 활발히 유통하고 있는 실정이다. 기존 전자문서는 다양한 포맷으로 작성되어 포맷간의 호환성의 문제 결여와 문서 내용이 의미하는 것을 사람과 컴퓨터가 함께 인식할 수 있는 건설관련 기술문서를 단순하게 전자적인 형태로 작성하기 않고 특정 하드웨어나 소프트웨어에서 구애를 받지 않으면서 표준화된 정보체계에 따라 전자문서화하고 기술문서의 활용성 제고를 위한 요구가 점차 강조되고 있다. 본 연구는 이러한 요구를 충족시키기 위해 건설CALS/EC의 구조적 언어인 XML 기술을 사용하여 기술문서를 전자문서화하기 위한 개발 절차와 방법을 마련하였다.

  • PDF