• Title/Summary/Keyword: Hangul Document Information

Search Result 32, Processing Time 0.024 seconds

Implementation of the Access Control System for Hangul Document System (한글 문서 접근 제어시스템 구현)

  • Jang, Seung-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.2
    • /
    • pp.323-329
    • /
    • 2018
  • In this paper, we implemented an access control system that only allows specific users to use documents in Hangul document system. The system structure designed in this paper is to transform header information of Hangul document by analyzing the structure of Hangul document. By modifying the function of a specific field of the header information in Hangul document, it prevents users that do not have data for the modified information to open and view the document. By controlling the access rights to important Hangul documents, it is possible to manage Hangul files more safely. In this paper, the actual design of information was implemented and experiments were carried out. Results of the experiment confirmed that the access control system is operated in normal way. In this paper, we implemented an access control system that only allows specific users to use documents in Hangul document system.

Keyword Spotting on Hangul Document Images Using Image-to-Image Matching (영상 대 영상 매칭을 이용한 한글 문서 영상에서의 단어 검색)

  • Park Sang Cheol;Son Hwa Jeong;Kim Soo Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3 s.99
    • /
    • pp.357-364
    • /
    • 2005
  • In this paper, we propose an accurate and fast keyword spotting system for searching user-specified keyword in Hangul document images by using two-level image-to-image matching. The system is composed of character segmentation, creating a query image, feature extraction, and matching procedure. Two different feature vectors are used in the matching procedure. An experiment using 1600 Hangul word images from 8 document images, downloaded from the website of Korea Information Science Society, demonstrates that the proposed system is superior to conventional image-based document retrieval systems.

A Hangul Document Classification System using Case-based Reasoning (사례기반 추론을 이용한 한글 문서분류 시스템)

  • Lee, Jae-Sik;Lee, Jong-Woon
    • Asia pacific journal of information systems
    • /
    • v.12 no.2
    • /
    • pp.179-195
    • /
    • 2002
  • In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

A Study of Automatic Indexing Technique based on Logical Structure of SGML Hangul Document (SGML 한글문서의 논리적 구조에 근거한 색인기법에 관한 연구)

  • 유석종
    • Journal of the Korean Society for information Management
    • /
    • v.12 no.2
    • /
    • pp.85-101
    • /
    • 1995
  • Conventional indexing sytstems support only full-text indexing method for electronic documents and do not use logical structure of documents in retrieval. Most electronic documents are in different formats depending on various systems. Also, they only indicate physical style of the document without considering any logical structure. Thus, in the effort to standardize the exchange of documents. IS0 developed SGML(Stadard Generalized Markup Language) which contains information about logical structure of the documents. In this paper, to resolve the disadvantages of full-text indexing method and to use standard document format. indexing system for SGML document is designed and implemented. In this system, user can assign indexing domain on elements, thus the logical structure of document is reflected in retrieving information. Various retrieval methods can be implemented by using the structural information of the document. In addition, automatic indexing for SGML Hangul document is supported in this system

  • PDF

A Hangul Document Image Retrieval System Using Rank-based Recognition (웨이브렛 특징과 순위 기반 인식을 이용한 한글 문서 영상 검색 시스템)

  • Lee Duk-Ryong;Kim Woo-Youn;Oh Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.2
    • /
    • pp.229-242
    • /
    • 2005
  • We constructed a full-text retrieval system for the scanned Hangul document images. The system consists of three parts; preprocessing, recognition, and retrieval components. The retrieval algorithm uses recognition results up to k-ranks. The algorithm is not only insensitive to the recognition errors, but also has the advantage of user-controllable recall and precision. For the objective performance evaluation, we used the scanned images of the Journal of Korea Information Science Society provided by KISTI. The system was shown to be practical through theevaluationofrecognitionandretrievalrates.

  • PDF

Bilingual document analysis and character segmentation using connected components (연결요소를 이용한 한.영 혼용문서의 구조분석 및 낱자분리)

  • 김민기;권영빈;한상용
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.3
    • /
    • pp.410-422
    • /
    • 1997
  • In this paper, we descried a bottom-up document structure analysis method in bilingual Korean-English document. We proposed a character segmentation method based on the layout information of connected component of each character. In many researches, a document has been analyzed into text blocks and graphics. We analyzed a document into four parts: text, table, graphic, and separator. A text is recursively subdivided into text blocks, text lines, words, and characters. To extract the character in bilingual text, we proposed a new method of word of word separation of Korean or English. Futhermore, we used a character merging and segmentation method in accordance with the properties of Hangul on the Korean word blocks. Experimental results on the various documents show that the proposed method is very effectively operated on the document structure analysis and the character segmentation.

  • PDF

Security Analysis on Digital Signature Function Implemented in Electronic Documents Software (전자문서 소프트웨어의 전자서명 기능에 대한 안전성 분석)

  • Park, Sunwoo;Lee, Changbin;Lee, Kwangwoo;Kim, Jeeyeon;Lee, Youngsook;Won, Dongho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.5
    • /
    • pp.945-957
    • /
    • 2012
  • Electronic documents have characteristics that detecting whether an electronic document is modified or not is not an easy process. Thus verifying integrity of documents is very important for using electronic documents. To facilitate this process, various electronic document software provide digital signature capabilities on themselves. However, there were not much research on the security of digital signature function of software. Therefore, in this paper, we analyze the security of Adobe PDF, MS Word, Hancom Hangul, digital notary service and digital year-end-settlement service, and propose recommendations for implementation of digital signature funcion.

Feature Selection for a Hangul Text Document Classification System (한글 텍스트 문서 분류시스템을 위한 속성선택)

  • Lee, Jae-Sik;Cho, You-Jung
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2003.05a
    • /
    • pp.435-442
    • /
    • 2003
  • 정보 추출(Information Retrieval) 시스템은 거대한 양의 정보들 가운데 필요한 정보의 적절한 탐색을 도와주기 위한 도구이다. 이는 사용자가 요구하는 정보를 보다 정확하고 보다 효과적이면서 보다 효율적으로 전달해주어야만 한다. 그러기 위해서는 문서내의 무수히 많은 속성들 가운데 해당 문서의 특성을 잘 반영하는 속성만을 선별해서 적절히 활용하는 것이 절실히 요구된다. 이에 본 연구는 기존의 한글 문서 분류시스템(CB_TFIDF)[1]의 정확도와 신속성 두 가지 측면의 성능향상에 초점을 두고 있다. 기존의 영문 텍스트 문서 분류시스템에 적용되었던 다양한 속성선택 기법들 가운데 잘 알려진 세가지 즉, Information Gain, Odds Ratio, Document Frequency Thresholding을 통해 선별적인 사례베이스를 구성한 다음에 한글 텍스트 문서 분류시스템에 적용시켜서 성능을 비교 평가한 후, 한글 문서 분류시스템에 가장 적절한 속성선택 기법과 속성 선택에 대한 가이드라인을 제시하고자 한다.

  • PDF

A Keyword Matching for the Retrieval of Low-Quality Hangul Document Images

  • Na, In-Seop;Park, Sang-Cheol;Kim, Soo-Hyung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.47 no.1
    • /
    • pp.39-55
    • /
    • 2013
  • It is a difficult problem to use keyword retrieval for low-quality Korean document images because these include adjacent characters that are connected. In addition, images that are created from various fonts are likely to be distorted during acquisition. In this paper, we propose and test a keyword retrieval system, using a support vector machine (SVM) for the retrieval of low-quality Korean document images. We propose a keyword retrieval method using an SVM to discriminate the similarity between two word images. We demonstrated that the proposed keyword retrieval method is more effective than the accumulated Optical Character Recognition (OCR)-based searching method. Moreover, using the SVM is better than Bayesian decision or artificial neural network for determining the similarity of two images.

Study on Methods of Digitalization of Older Books Using PDF (PDF를 활용한 고문헌의 원문디지털화 방안에 대한 고찰)

  • Lee, Sang-Yong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.34 no.1
    • /
    • pp.133-153
    • /
    • 2000
  • This article is a study on methods of digitalization for eider books using PDF (Portable Document Format) supported by Acrobat 4.0 which was introduced in April of 1999. Acrobat 3.0 has caused many problems in supporting Korean language or Hangul. However, the revised 4.0 version of this software made the conversion of Korean, Japanese and Chinese language possible due to its support by the multi-language fonts. Therefore, it Is possible to converse and to edit the text file of older books written with Hangul. The Acrobat Reader, the viewer of PDF, can be downloaded for free from its website. However, the digitalized text of older books by PDF has still some problems. But the user can retrieve the text of older books from the Internet easily.

  • PDF