• Title/Summary/Keyword: Document research

Search Result 1,345, Processing Time 0.025 seconds

Analysis on Current Issues and Cases of Electronic Document Delivery Service for Sharing of Knowledge Information (지식정보 공유를 위한 전자원문서비스의 주요 이슈와 사례 분석)

  • Yoo, Su-Hyeon;Choi, Hee-Yoon
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.81-96
    • /
    • 2006
  • Changes in document delivery service environment such as spread of web-based research information communication and direct communication between users and information providers have considerable effects on document delivery service institutes. Swift advances in information technology have allowed users to receive information on their desktops via web. Web-based document delivery makes the massive scale of reproduction and distribution possible so it needs to protect the copyright holders' rights. This study identifies the current trends and issues of document delivery service environment and reviews electronic document delivery services of foreign countries. Also this study introduces the domestic electronic document delivery service, e-DDS, and evaluates the copyright issues for the service.

Structure Recognition Method of Invoice Document Image for Document Processing Automation (문서 처리 자동화를 위한 인보이스 이미지의 구조 인식 방법)

  • Dong-seok Lee;Soon-kak Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.11-19
    • /
    • 2023
  • In this paper, we propose the methods of invoice document structure recognition and of making a spreadsheet electronic document. The texts and block location information of word blocks are recognized by an optical character recognition engine through deep learning. The word blocks on the same row and same column are found through their coordinates. The document area is divided through arrangement information of the word blocks. The character recognition result is inputted in the spreadsheet based on the document structure. In simulation result, the item placement through the proposed method shows an average accuracy of 92.30%.

Multi-Document Summarization Method Based on Semantic Relationship using VAE (VAE를 이용한 의미적 연결 관계 기반 다중 문서 요약 기법)

  • Baek, Su-Jin
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.341-347
    • /
    • 2017
  • As the amount of document data increases, the user needs summarized information to understand the document. However, existing document summary research methods rely on overly simple statistics, so there is insufficient research on multiple document summaries for ambiguity of sentences and meaningful sentence generation. In this paper, we investigate semantic connection and preprocessing process to process unnecessary information. Based on the vocabulary semantic pattern information, we propose a multi-document summarization method that enhances semantic connectivity between sentences using VAE. Using sentence word vectors, we reconstruct sentences after learning from compressed information and attribute discriminators generated as latent variables, and semantic connection processing generates a natural summary sentence. Comparing the proposed method with other document summarization methods showed a fine but improved performance, which proved that semantic sentence generation and connectivity can be increased. In the future, we will study how to extend semantic connections by experimenting with various attribute settings.

A Study on Extracting the Document Text for Unallocated Areas of Data Fragments (비할당 영역 데이터 파편의 문서 텍스트 추출 방안에 관한 연구)

  • Yoo, Byeong-Yeong;Park, Jung-Heum;Bang, Je-Wan;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.6
    • /
    • pp.43-51
    • /
    • 2010
  • It is meaningful to investigate data in unallocated space because we can investigate the deleted data. Consecutively complete file recovery using the File Carving is possible in unallocated area, but noncontiguous or incomplete data recovery is impossible. Typically, the analysis of the data fragments are needed because they should contain large amounts of information. Microsoft Word, Excel, PowerPoint and PDF document file's text are stored using compression or specific document format. If the part of aforementioned document file was stored in unallocated data fragment, text extraction is possible using specific document format. In this paper, we suggest the method of extracting a particular document file text in unallocated data fragment.

A DOM-Based Fuzzing Method for Analyzing Seogwang Document Processing System in North Korea (북한 서광문서처리체계 분석을 위한 Document Object Model(DOM) 기반 퍼징 기법)

  • Park, Chanju;Kang, Dongsu
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.5
    • /
    • pp.119-126
    • /
    • 2019
  • Typical software developed and used by North Korea is Red Star and internal application software. However, most of the existing research on the North Korean software is the software installation method and general execution screen analysis. One of the ways to identify software vulnerabilities is file fuzzing, which is a typical method for identifying security vulnerabilities. In this paper, we use file fuzzing to analyze the security vulnerability of the software used in North Korea's Seogwang Document Processing System. At this time, we propose the analysis of open document text (ODT) file produced by Seogwang Document Processing System, extraction of node based on Document Object Mode (DOM) to determine test target, and generation of mutation file through insertion and substitution, this increases the number of crash detections at the same testing time.

Automated networked knowledge map using keyword-based document networks (키워드 기반 문서 네트워크를 이용한 네트워크형 지식지도 자동 구성)

  • Yoo, Keedong
    • Knowledge Management Research
    • /
    • v.19 no.3
    • /
    • pp.47-61
    • /
    • 2018
  • A knowledge map, a taxonomy of knowledge repositories, must have capabilities supporting and enhancing knowledge user's activity to search and select proper knowledge for problem-solving. Conventional knowledge maps, however, have been hierarchically categorized, and could not support such activity that must coincide with the user's cognitive process for knowledge utilization. This paper, therefore, aims to verify and develop a methodology to build a networked knowledge map that can support user's activity to search and retrieve proper knowledge based on the referential navigation between content-relevant knowledge. This paper deploys keywords as the semantic information between knowledge, because they can represent the overall contents of a given document, and because they can play the role of semantic information on the link between related documents. By aggregating links between documents, a document network can be formulated: a keyword-based networked knowledge map can be finally built. Domain expert-based validation test was also conducted on a networked knowledge map of 50 research papers, which confirmed the performance of the proposed methodology to be outstanding with respect to the precision and recall.

A Automatic Document Summarization Method based on Principal Component Analysis

  • Kim, Min-Soo;Lee, Chang-Beom;Baek, Jang-Sun;Lee, Guee-Sang;Park, Hyuk-Ro
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.491-503
    • /
    • 2002
  • In this paper, we propose a automatic document summarization method based on Principal Component Analysis(PCA) which is one of the multivariate statistical methods. After extracting thematic words using PCA, we select the statements containing the respective extracted thematic words, and make the document summary with them. Experimental results using newspaper articles show that the proposed method is superior to the method using either word frequency or information retrieval thesaurus.

Automatic Single Document Text Summarization Using Key Concepts in Documents

  • Sarkar, Kamal
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.602-620
    • /
    • 2013
  • Many previous research studies on extractive text summarization consider a subset of words in a document as keywords and use a sentence ranking function that ranks sentences based on their similarities with the list of extracted keywords. But the use of key concepts in automatic text summarization task has received less attention in literature on summarization. The proposed work uses key concepts identified from a document for creating a summary of the document. We view single-word or multi-word keyphrases of a document as the important concepts that a document elaborates on. Our work is based on the hypothesis that an extract is an elaboration of the important concepts to some permissible extent and it is controlled by the given summary length restriction. In other words, our method of text summarization chooses a subset of sentences from a document that maximizes the important concepts in the final summary. To allow diverse information in the summary, for each important concept, we select one sentence that is the best possible elaboration of the concept. Accordingly, the most important concept will contribute first to the summary, then to the second best concept, and so on. To prove the effectiveness of our proposed summarization method, we have compared it to some state-of-the art summarization systems and the results show that the proposed method outperforms the existing systems to which it is compared.

Document Clustering Using Semantic Features and Fuzzy Relations

  • Kim, Chul-Won;Park, Sun
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.3
    • /
    • pp.179-184
    • /
    • 2013
  • Traditional clustering methods are usually based on the bag-of-words (BOW) model. A disadvantage of the BOW model is that it ignores the semantic relationship among terms in the data set. To resolve this problem, ontology or matrix factorization approaches are usually used. However, a major problem of the ontology approach is that it is usually difficult to find a comprehensive ontology that can cover all the concepts mentioned in a collection. This paper proposes a new document clustering method using semantic features and fuzzy relations for solving the problems of ontology and matrix factorization approaches. The proposed method can improve the quality of document clustering because the clustered documents use fuzzy relation values between semantic features and terms to distinguish clearly among dissimilar documents in clusters. The selected cluster label terms can represent the inherent structure of a document set better by using semantic features based on non-negative matrix factorization, which is used in document clustering. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

A Study on Transforming ICT Research Information Service into Semantic Web Environment

  • Song, Jong-Cheol;Moon, Byung-Joo;Jung, Hoe-Kyung
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.3
    • /
    • pp.249-253
    • /
    • 2007
  • The Research on the ICT(Information & Communication Technology) is proposed the category to IT839 strategy by Government. Government is driving to researching on technology about IT839 Strategy. By transforming this category and research information into Semantic Web environment, it is possible to search function utilizing knowledge base and information object by use of TBox and ABox. In this regard, this study proposes technology for generation of Semantic Web Document about ICT Research Information. The ontology is constructed by using category to IT839 Strategy. The features of framework proposed in this study is to have used a skill to directly map Ontology instance and in case of inability of direct mapping, proposed a skill to establish reliable Semantic Web Document by suggesting indirect mapping skill using mechanical study. In addition, it is possible to establish low cost/high quality Semantic Web Document about ICT research information.