• Title/Summary/Keyword: document image

Search Result 301, Processing Time 0.025 seconds

Documentation of Printed Hangul Images of the Selected Area by Finger Movement (손가락 이동에 의해 선택된 영역의 인쇄체 한글 영상 문서화)

  • Beak, Seung-Bok
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.306-310
    • /
    • 2002
  • In this paper, we realized a system that converts the Korean alphabet (Hangul) images, which are in any domain that is formed by the finger movement on the Hangul document, to the editable characters and then outputs them to the word editor. The domain of hand is separated from the sphere of document in the pre-process step of image. The centroid point of hand is drawn by the maximum circular movement method. After the system recognizes the hand with the circular pattern vector algorithm, finds out the position of finger by the distance spectrum and then draws out the sphere of selected character image by the finger movement to divide the characters into character units by applying the histogram between the Hangul characters. We standardized the characters of various sizes. We used the circular pattern vector algorithm that grafts on the fuzzy inference to divert the character images of the domain, which user wants, to the editable characters by comparing the characteristic vectors between the standard pattern character and the inputted character and by recognizing the character.

The Construction and Common Use of Old Document DB in the Foreign Countries (해외 소장 고문헌의 DB구축과 공동활용 방안)

  • Kang, Soon-Ae
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.42 no.3
    • /
    • pp.61-79
    • /
    • 2008
  • The purpose of this paper is to investigate the three aspects of the construction and common use of old document DB in the foreign countries: i) the processing of old documents, ii) the problem and improvement of DB systems of old documents. and iii) the common use of old document DB. Results from this research are as follows: The National Library of Korea(NLK) copied old documents in the foreign countries from 1982 to 2006 and published the brief catalog. The Reogang Publishing company issued four volumes catalogs of old document in Japan. The National Research Institute of Cultural Heritage(NRICH) investigated old books and published some catalogs of several organizations in Japan. America. France. and all. The National Institute of Korean History(NIKH) investigated old archives and published some catalogs of several organizations in Japan. The characteristics of the Korean Old and Rare Collection Information System(KORCIS) of the NLK, the Old Books Cultural Heritage in Overseas System of the NRICH. and the Korea History DB System and MF Catalog/ Image System of NIKH were described in the DB systems of old documents, the problems of DB systems were checked over and some alternatives were suggested. In the common use of old document DB, KORMARC format and description rules(draft) for archives should be revised to adopt a new standard such as KS editions. and all the institutes involved should thoroughly follow the standards. when creating bibliographic records and digitizing texts. It is necessary to educate and train the specialists of old documents. A government organization should be established to supervise all the procedures of developing technology for sharing digitized resources. using contents. and cooperating with the related internationl organizations and institutes.

Supporting Media using XML-based Messages on Online Conversational Activity (온라인 대화 행위에서 XML 기반 메시지를 이용한 미디어 지원)

  • Kim, Kyung-Deok
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.91-98
    • /
    • 2004
  • This paper proposes how to support various media on online conversational activity using XML(extensible Markup Language). The method converts media information into XML based messages and handles alike conventional text based messages. The XML based messages are unified to an XML document, and then a HTML document is generated using the XML and an XSLT documents in a server. A user in each client can play or present media through the hyperlink that is associated media information on the HTML document. The suggested method supports use of various media (text, image, audio, video, documents, etc) and efficient maintenance of font size, color, and style on messages according to extension and modification of XML tags. For application, this paper implemented the system to support media that has client and server architecture on online conversational activity. A user in each client inputs text or media based message using JAVA applet and servlet on the system, and conversational messages on every users' interfaces are automatically updated whenever a user inputs new message. Media on conversational messages are played or presented according to a user's click on hyperlink. Applications for the media presentation are as follows : distance learning, online game, collaboration, etc.

Automatic Generation of Training Character Samples for OCR Systems

  • Le, Ha;Kim, Soo-Hyung;Na, In-Seop;Do, Yen;Park, Sang-Cheol;Jeong, Sun-Hwa
    • International Journal of Contents
    • /
    • v.8 no.3
    • /
    • pp.83-93
    • /
    • 2012
  • In this paper, we propose a novel method that automatically generates real character images to familiarize existing OCR systems with new fonts. At first, we generate synthetic character images using a simple degradation model. The synthetic data is used to train an OCR engine, and the trained OCR is used to recognize and label real character images that are segmented from ideal document images. Since the OCR engine is unable to recognize accurately all real character images, a substring matching method is employed to fix wrongly labeled characters by comparing two strings; one is the string grouped by recognized characters in an ideal document image, and the other is the ordered string of characters which we are considering to train and recognize. Based on our method, we build a system that automatically generates 2350 most common Korean and 117 alphanumeric characters from new fonts. The ideal document images used in the system are postal envelope images with characters printed in ascending order of their codes. The proposed system achieved a labeling accuracy of 99%. Therefore, we believe that our system is effective in facilitating the generation of numerous character samples to enhance the recognition rate of existing OCR systems for fonts that have never been trained.

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

The Project and Prospects of Old Documents Information Systems in Korea (한국 고문헌 정보시스템의 구축 및 전망)

  • Kang Soon-Ae
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.4
    • /
    • pp.83-112
    • /
    • 1997
  • The purpose of this paper Is to describe the matters to plan the best information systems in Korean old books. It analyzes: i) a range of definition of old books, ii) its characteristics and current state of processing the old documents, iii) the scope of automation and building up the library institution, iv) the construction of Korean old books Information systems, v) its case study, and vi) the evaluation and vision of system. The old document information system have been organized on the basis of library networks systems with the National Central Library as leader, its implemented system has the subsystem such as cataloging system, annotation system, full-text or image-based system, and retrieval system. In case study, it is suggested two examples which has been built in the National Central Library and Sung Kyun Kwan university. finally, it provides the evaluation criteria and vision for the library which designs the old document information systems.

  • PDF

Color Seal Extraction of Document Images using An Extended Fuzzy Integral (확장된 퍼지적분을 이용한 문서영상의 컬러낙관 추출)

  • Park, In-Kyu;Choi, Gyoo-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.1
    • /
    • pp.31-37
    • /
    • 2009
  • This paper casts an application of an extended fuzzy integral on the selective extraction of color clusters characterized by a particular color hue from color document images. Despite of the negative role of fuzzy integral, the presented approach attains the detection of the seals through the neighborhood information via the center of area method. The conventional fuzzy integral evaluates the negative aspects of the importance about the items by min operator, which result in the discontinuous parts of seals. In an attempt to cope with the drawback our approach considers the integral aspects via the center of area method, which results in the robustness of the images. Finally, the framework is successfully tested on a data set formed by documents from a real application for the detection.

  • PDF

Auto Detection System of Personal Information based on Images and Document Analysis (이미지와 문서 분석을 통한 개인 정보 자동 검색 시스템)

  • Cho, Jeong-Hyun;Ahn, Cheol-Woong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.5
    • /
    • pp.183-192
    • /
    • 2015
  • This paper proposes Personal Information Auto Detection(PIAD) System to prevent leakage of Personal informations in document and image files that can be used by mobile service provider. The proposed system is to automatically detect the images and documents that contain personal informations and shows the result to the user. The PIAD is divided into the selection step for fast and accurate retrieval images and analysis which is composed of SURF, erosion and dilation, FindContours algorithm. The result of proposed PIAD system showed more than 98% accuracy by selection and analysis steps, 267 images detection of 272 images.

A Study on the Characteristics and Images of Rainbow Colors For Fashion Design (패션 디자인을 위한 무지개 색의 특성과 이미지 고찰)

  • 김지언;김영인
    • Journal of the Korean Society of Costume
    • /
    • v.54 no.5
    • /
    • pp.125-138
    • /
    • 2004
  • This study has the aim of defining the special characteristics and images of the rainbow colored fashion by understanding the theoretical bases of rainbow colors and analyzing rainbow colored fashion images in historical materials. western and folk costumes. modern fashion design. Giving careful consideration to the rainbow colored fashion makes it possible to develope the innovative way of fashion design to satisfy the needs of color usual for designers and colorists. To obtain the purposes, document study and survey study have been executed. The results of this study are as follows. In document studies, the beginning of rainbow colored fashion went back in ancient Egypt. Also saikdong of korea, poncho of indians are the examples of the rainbow colored fashion. The rainbow colored fashion were put on a man of position in principle ceremony for ornaments. In survey studies, the clothing perception characteristics in rainbow colored fashion were analyzed. Main factors of perception characteristics In the rainbow colored fashion are 'closed form', 'whole', 'indeterminate', 'rounded', 'planar separation' The factors that affect the perception of rainbow colored fashion are 'closed form' and 'indeterminate' characteristics. And rainbow colored fashion images and clothing perception characteristics can be classified into four main images : Vigorous, Colorful/fairy, Fresh, Mysterious/brilliant. Therefore. this study is to systematize the characteristics and images of rainbow colors. Based on the results makes it possible to adapt rainbow colors to fashion design efficiently, for the suggested design elements and color palettes include basic three fashion design elements color. texture. form.

Adaptive thresholding for two-dimensional barcode images using two thresholds and the integral image (이중 문턱 값과 적분영상을 이용한 2차원 바코드 영상의 적응적 이진화)

  • Lee, Yeon-Kyung;Yoo, Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.11
    • /
    • pp.2453-2458
    • /
    • 2012
  • In this paper, we propose an adaptive thresholding method to binarize two-dimensional barcode images. Adaptive thresholding methods that minimize light effects convert an original image into a binary image. The methods are applied to document image binarization. The methods, however, have problems of determining box size used in adaptive thresholding. thus, they inappropriate to use in recognition of two-dimensional barcode images. To overcome the problem, we analysis the problem and propose a new adaptive threshold method using the integral image. To show the effectiveness of our method, we compared our method with the well-known existing methods in terms of visual quality and processing time. The experimental result indicates that the proposed method is superior to the existing method.