• Title/Summary/Keyword: Document Recognition

Search Result 182, Processing Time 0.028 seconds

Research and Development of Document Recognition System for Utilizing Image Data (이미지데이터 활용을 위한 문서인식시스템 연구 및 개발)

  • Kwag, Hee-Kue
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.125-138
    • /
    • 2010
  • The purpose of this research is to enhance document recognition system which is essential for developing full-text retrieval system of the document image data stored in the digital library of a public institution. To achieve this purpose, the main tasks of this research are: 1) analyzing the document image data and then developing its image preprocessing technology and document structure analysis one, 2) building its specialized knowledge base consisting of document layout and property, character model and word dictionary, respectively. In addition, developing the management tool of this knowledge base, the document recognition system is able to handle the various types of the document image data. Currently, we developed the prototype system of document recognition which is combined with the specialized knowledge base and the library of document structure analysis, respectively, adapted for the document image data housed in National Archives of Korea. With the results of this research, we plan to build up the test-bed and estimate the performance of document recognition system to maximize the utilization of full-text retrieval system.

Feature Extraction Method for the Character Recognition of the Low Resolution Document

  • Kim, Dae-Hak;Cheong, Hyoung-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.525-533
    • /
    • 2003
  • In this paper we introduce some existing preprocessing algorithm for character recognition and consider feature extraction method for the recognition of low resolution document. Image recognition of low resolution document including fax images can be frequently misclassified due to the blurring effect, slope effect, noise and so on. In order to overcome these difficulties in the character recognition we considered a mesh feature extraction and contour direction code feature. System for automatic character recognition were suggested.

  • PDF

Structure Recognition Method of Invoice Document Image for Document Processing Automation (문서 처리 자동화를 위한 인보이스 이미지의 구조 인식 방법)

  • Dong-seok Lee;Soon-kak Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.11-19
    • /
    • 2023
  • In this paper, we propose the methods of invoice document structure recognition and of making a spreadsheet electronic document. The texts and block location information of word blocks are recognized by an optical character recognition engine through deep learning. The word blocks on the same row and same column are found through their coordinates. The document area is divided through arrangement information of the word blocks. The character recognition result is inputted in the spreadsheet based on the document structure. In simulation result, the item placement through the proposed method shows an average accuracy of 92.30%.

Development of an image processing algorithm for korean document recognition (인식률을 향상한 한글문서 인식 알고리즘 개발)

  • 김희식;김영재;이평원
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.1391-1394
    • /
    • 1997
  • This paper proposes a new image processing algorithm to recognize korean documents. It take out the region of text area form input image, then it makes esgmentation of lines, words and characters in the text. A precision segmentation is very important to recognize the input document. The input image has 8-bit gray scaled resolution. Not only the histogram but also brightness dispersion graph are used for segmentation. The result shows a higher accuracy of document recognition.

  • PDF

Document Image Binarization Technique using MSER (MSER을 이용한 문서 이미지 이진화 기법)

  • Yu, Young-Jung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.8
    • /
    • pp.1941-1947
    • /
    • 2014
  • Document image binarization is largely used as previous stage of document recognition. And the result of document recognition is much affected from the result of document image binarization. There were many studies to binarize document images. The results of previous studies for document image binarization is varied according to the state of document images. In this paper, we propose a technique for document image binarization using MSER that is applied to extract objects from an image. At first, raw MSER objects are extracted from a document image. Because the raw MSER objects cannot be used for document image binarization, the extracted raw MSER objects are modified. Then the final MSER objects are used for document image binarization with the contrast image that is extracted from the document image. Experimental results show that the proposed technique is useful for document image binarization.

Mongolian Traditional Stamp Recognition using Scalable kNN

  • Gantuya., P;Mungunshagai., B;Suvdaa., B
    • International journal of advanced smart convergence
    • /
    • v.4 no.2
    • /
    • pp.170-176
    • /
    • 2015
  • The stamp is one of the crucial information of traditional historical and cultural for nations. In this paper, we purpose to detect official stamps from scanned document and recognize the Mongolian traditional, historical stamps. Therefore we performed following steps: first, we detect official stamps from scanned document based on red-color segmentation and document standard. Then we collected 234 traditional stamp images with 6 classes and 100 official stamp images from scanned document images. Also we implemented the processing algorithms for noise removing, resize and reshape etc. Finally, we proposed a new scale invariant classification algorithm based on KNN (k-nearest neighbor). In the experimental result, our proposed a method had shown proper recognition rate.

Methods of Classification and Character Recognition for Table Items through Deep Learning (딥러닝을 통한 문서 내 표 항목 분류 및 인식 방법)

  • Lee, Dong-Seok;Kwon, Soon-Kak
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.5
    • /
    • pp.651-658
    • /
    • 2021
  • In this paper, we propose methods for character recognition and classification for table items through deep learning. First, table areas are detected in a document image through CNN. After that, table areas are separated by separators such as vertical lines. The text in document is recognized through a neural network combined with CNN and RNN. To correct errors in the character recognition, multiple candidates for the recognized result are provided for a sentence which has low recognition accuracy.

Adaptive Binarization for Camera-based Document Recognition (카메라 기반 문서 인식을 위한 적응적 이진화)

  • Kim, In-Jung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.3
    • /
    • pp.132-140
    • /
    • 2007
  • The quality of the camera image is worse than that of the scanner image because of lighting variation and inaccurate focus. This paper proposes a binarization method for camera-based document recognition, which is tolerant to low-quality camera images. Based on an existing method reported to be effective in previous evaluations, we enhanced the adaptability to the image with a low contrast due to low intensity and inaccurate focus. Furthermore, applying an additional small-size window in the binarization process, it is effective to extract the fine detail of character structure, which is often degraded by conventional methods. In experiments, we applied the proposed method as well as other methods to a document recognizer and compared the performance for many cm images. The result showed the proposed method is effective for recognition of document images captured by the camera.

  • PDF

Structure Recognition Method in Various Table Types for Document Processing Automation (문서 처리 자동화를 위한 다양한 표 유형에서 표 구조 인식 방법)

  • Lee, Dong-Seok;Kwon, Soon-Kak
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.5
    • /
    • pp.695-702
    • /
    • 2022
  • In this paper, we propose the method of a table structure recognition in various table types for document processing automation. A table with items surrounded by ruled lines are analyzed by detecting horizontal and vertical lines for recognizing the table structure. In case of a table with items separated by spaces, the table structure are recognized by analyzing the arrangement of row items. After recognizing the table structure, the areas of the table items are input into OCR engine and the character recognition result output to a text file in a structured format such as CSV or JSON. In simulation results, the average accuracy of table item recognition is about 94%.

Step-by-step Approach for Effective Korean Unknown Word Recognition (한국어 미등록어 인식을 위한 단계별 접근방법)

  • Park, So-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.369-372
    • /
    • 2009
  • Recently, newspapers as well as web documents include many newly coined words such as "mid"(meaning "American drama" since "mi" means "America" in Korean and "d" refers to the "d" of drama) and "anseup"(meaning "pathetic" since "an" and "seup" literally mean eyeballs and moist respectively). However, these words cause a Korean analyzing system's performance to decrease. In order to recognize these unknown word automatically, this paper propose a step-by-step approach consisting of an unknown noun recognition phase based on full text analysis, an unknown verb recognition phase based on web document frequency, and an unknown noun recognition phase based on web document frequency. The proposed approach includes the phase based on full text analysis to recognize accurately the unknown words occurred once and again in a document. Also, the proposed approach includes two phases based on web document frequency to recognize broadly the unknown words occurred once in the document. Besides, the proposed model divides between an unknown noun recognition phase and an unknown verb recognition phase to recognize various unknown words. Experimental results shows that the proposed approach improves precision 1.01% and recall 8.50% as compared with a previous approach.

  • PDF