• Title/Summary/Keyword: Document Image

Search Result 301, Processing Time 0.02 seconds

A Hangul Document Image Retrieval System Using Rank-based Recognition (웨이브렛 특징과 순위 기반 인식을 이용한 한글 문서 영상 검색 시스템)

  • Lee Duk-Ryong;Kim Woo-Youn;Oh Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.2
    • /
    • pp.229-242
    • /
    • 2005
  • We constructed a full-text retrieval system for the scanned Hangul document images. The system consists of three parts; preprocessing, recognition, and retrieval components. The retrieval algorithm uses recognition results up to k-ranks. The algorithm is not only insensitive to the recognition errors, but also has the advantage of user-controllable recall and precision. For the objective performance evaluation, we used the scanned images of the Journal of Korea Information Science Society provided by KISTI. The system was shown to be practical through theevaluationofrecognitionandretrievalrates.

  • PDF

Purchase Information Extraction Model From Scanned Invoice Document Image By Classification Of Invoice Table Header Texts (인보이스 서류 영상의 테이블 헤더 문자 분류를 통한 구매 정보 추출 모델)

  • Shin, Hyunkyung
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.383-387
    • /
    • 2012
  • Development of automated document management system specified for scanned invoice images suffers from rigorous accuracy requirements for extraction of monetary data, which necessiate automatic validation on the extracted values for a generative invoice table model. Use of certain internal constraints such as "amount = unit price times quantity" is typical implementation. In this paper, we propose a noble invoice information extraction model with improved auto-validation method by utilizing table header detection and column classification.

Methods of Classification and Character Recognition for Table Items through Deep Learning (딥러닝을 통한 문서 내 표 항목 분류 및 인식 방법)

  • Lee, Dong-Seok;Kwon, Soon-Kak
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.5
    • /
    • pp.651-658
    • /
    • 2021
  • In this paper, we propose methods for character recognition and classification for table items through deep learning. First, table areas are detected in a document image through CNN. After that, table areas are separated by separators such as vertical lines. The text in document is recognized through a neural network combined with CNN and RNN. To correct errors in the character recognition, multiple candidates for the recognized result are provided for a sentence which has low recognition accuracy.

A Study on the Pattern Segmentation and Classification in Specially Documentated Imaged (제한된 문서 영상에서 패턴 분절과 구분 처리에 관한 연구)

  • 옥철호;허도근;진용옥
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.14 no.6
    • /
    • pp.663-674
    • /
    • 1989
  • In order to design the automatic processing system of image document, the pattern segmentation of image document and classification methods are presented. The contour extraction using first order differential operator of Gauassian distribution fucntions, the image segmentation using the chain code, and the pattern classication using the second order moments and two=dimensional Rf distance(in transform domain) are implemented. The resuts applied in specially documantated image shows to classify the characters, fingerprints, seals etc well. And the utility of the used algorithms is verified.

  • PDF

Feature Extraction Method for the Character Recognition of the Low Resolution Document

  • Kim, Dae-Hak;Cheong, Hyoung-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.525-533
    • /
    • 2003
  • In this paper we introduce some existing preprocessing algorithm for character recognition and consider feature extraction method for the recognition of low resolution document. Image recognition of low resolution document including fax images can be frequently misclassified due to the blurring effect, slope effect, noise and so on. In order to overcome these difficulties in the character recognition we considered a mesh feature extraction and contour direction code feature. System for automatic character recognition were suggested.

  • PDF

Design and Implementation of Two Dimensional Iconic Image Indexing Method using Signatures (시그니쳐를 이용한 2차원 아이코닉 이미지 색인 방법의 설계 및 구현)

  • Chang, Ki-Jin;Chang, Jae-Woo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.4
    • /
    • pp.720-732
    • /
    • 1996
  • Spatial match retrieval methods for iconic image databases recognize an image document as several icon symbols. Therefore the iconic symbols are used as primary keys to index the image document. When a user requires content-based retrieval ofimages, a spatial match retrieval method converts a query image into iconic symbols and then retrieves relevant images by accessing stored images. In order to support content-based image retrieval efficiently, we, in this paper, propose spatial match retrieval methods using signatures for iconic image databases. For this, we design new index representations of two-dimensional iconic images and explain implemented system.. In addition, we compare the conventional 9-DLT and our two-dimensional image retrieval method in terms of retrieval precision and recall ratio. We show that our method is more efficient than the conventional method.

  • PDF

Structure Recognition Method of Invoice Document Image for Document Processing Automation (문서 처리 자동화를 위한 인보이스 이미지의 구조 인식 방법)

  • Dong-seok Lee;Soon-kak Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.11-19
    • /
    • 2023
  • In this paper, we propose the methods of invoice document structure recognition and of making a spreadsheet electronic document. The texts and block location information of word blocks are recognized by an optical character recognition engine through deep learning. The word blocks on the same row and same column are found through their coordinates. The document area is divided through arrangement information of the word blocks. The character recognition result is inputted in the spreadsheet based on the document structure. In simulation result, the item placement through the proposed method shows an average accuracy of 92.30%.

A Study on u-paperless and secure credit card delivery system development

  • Song, Yeongsim;Jang, Jinwook;jeong, Jongsik;Ahn, Taejoon;Joh, Joowan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.4
    • /
    • pp.83-90
    • /
    • 2017
  • In the past, when the credit card was delivered to the customer, the postal agreement and receipt were signed by customer. The repossessed documents were sent back to the card company through the reorganization process. The card company checks the error by scanning and keeps it in the document storage room. This process is inefficient in cost and personnel due to delivery time, document print out, document sorting, image scanning, inspection work, and storage. Also, the risk of personal data spill is very high in the process of providing personal information. The proposed system is a service that receives a postal agreement and a receipt to a recipient when signing a credit card, signing the mobile image instead of paper, and automatically sending it to the card company server. We have designed a system that can protect the cost of paper documents, complicated work procedures, delivery times and personal information. In this study, we developed 'u-paperless' and secure credit card delivery system applying electronic document and security system.

Document Image Binarization Using a Water Flow Model (Water Flow Model을 이용한 문서 영상의 이진화)

  • Kim, In-Gwon;Jeong, Dong-Uk;Song, Jeong-Hui;Park, Rae-Hong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.1
    • /
    • pp.19-32
    • /
    • 2001
  • This paper proposes a local adaptive thresholding method based on a water flow model, in which an image surface is considered as a 3-dimensional (3-D) terrain. To extract characters from backgrounds, we pour water onto the terrain surface. Water flows down to the lower regions of the terrain and fills valleys. Then, the amount of filled water is thresholded, in which the proposed thresholding method is applied to gray level document images consisting of characters and backgrounds. The proposed method based on a water flow model shows the property of locally adaptive thresholding. Computer simulation with synthetic and real document images shows that the proposed method yields effective adaptive thresholding results for binarization of document images.

  • PDF

Patent Document Similarity Based on Image Analysis Using the SIFT-Algorithm and OCR-Text

  • Park, Jeong Beom;Mandl, Thomas;Kim, Do Wan
    • International Journal of Contents
    • /
    • v.13 no.4
    • /
    • pp.70-79
    • /
    • 2017
  • Images are an important element in patents and many experts use images to analyze a patent or to check differences between patents. However, there is little research on image analysis for patents partly because image processing is an advanced technology and typically patent images consist of visual parts as well as of text and numbers. This study suggests two methods for using image processing; the Scale Invariant Feature Transform(SIFT) algorithm and Optical Character Recognition(OCR). The first method which works with SIFT uses image feature points. Through feature matching, it can be applied to calculate the similarity between documents containing these images. And in the second method, OCR is used to extract text from the images. By using numbers which are extracted from an image, it is possible to extract the corresponding related text within the text passages. Subsequently, document similarity can be calculated based on the extracted text. Through comparing the suggested methods and an existing method based only on text for calculating the similarity, the feasibility is achieved. Additionally, the correlation between both the similarity measures is low which shows that they capture different aspects of the patent content.