Search | Korea Science

A Study on the Hangul Recognition Using Hough Transform and Subgraph Pattern (Hough Transform과 부분 그래프 패턴을 이용한 한글 인식에 관한 연구)

구하성;박길철
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.3 no.1
- /
- pp.185-196
- /
- 1999
In this dissertation, a new off-line recognition system is proposed using a subgraph pattern, neural network. After thinning is applied to input characters, balance having a noise elimination function on location is performed. Then as the first step for recognition procedure, circular elements are extracted and recognized. From the subblock HT, space feature points such as endpoint, flex point, bridge point are extracted and a subgraph pattern is formed observing the relations among them. A region where vowel can exist is allocated and a candidate point of the vowel is extracted. Then, using the subgraph pattern dictionary, a vowel is recognized. A same method is applied to extract horizontal vowels and the vowel is recognized through a simple structural analysis. For verification of recognition subgraph in this paper, experiments are done with the most frequently used Myngjo font, Gothic font for printed characters and handwritten characters. In case of Gothic font, character recognition rate was 98.9%. For Myngjo font characters, the recognition rate was 98.2%. For handwritten characters, the recognition rate was 92.5%. The total recognition rate was 94.8% with mixed handwriting and printing characters for multi-font recognition.
PDF

Comment on the Copyrightability of Font-files as Computer Program (글자체파일의 컴퓨터프로그램저작물성 판단에 대한 비판)

Jeong, Jin-Keun
- Journal of Software Assessment and Valuation
- /
- v.15 no.2
- /
- pp.17-24
- /
- 2019
Use without permission of font files is a social problem. In the meantime, our court recognized font files as computer programs. Is the font file a computer program? This recognition arises from the inability to distinguish between computer programs and data. Expert recognition, on the other hand, does not recognize font files as computer programs. In this regard, there was a case in 2014 that INI files were not computer programs, but only data files. So, the attitude of the Supreme Court in 2001 only makes it difficult to distinguish between computer programs and data. The Supreme Court's decision needs to be changed. In addition, a new legal system should be in place to protect font files.
https://doi.org/10.29056/jsav.2019.12.03 인용

A Study on Influence of Stroke Element Properties to find Hangul Typeface Similarity (한글 글꼴 유사성 판단을 위한 획 요소 속성의 영향력 분석)

Park, Dong-Yeon;Jeon, Ja-Yeon;Lim, Seo-Young;Lim, Soon-Bum
- Journal of Korea Multimedia Society
- /
- v.23 no.12
- /
- pp.1552-1564
- /
- 2020
As various styles of fonts were used, there were problems such as output errors due to uninstalled fonts and difficulty in font recognition. To solve these problems, research on font recognition and recommendation were actively conducted. However, Hangul font research remains at the basic level. Therefore, in order to automate the comparison on Hangul font similarity in the future, we analyze the influence of each stroke element property. First, we select seven representative properties based on Hangul stroke shape elements. Second, we design a calculation model to compare similarity between fonts. Third, we analyze the effect of each stroke element through the cosine similarity between the user's evaluation and the results of the model. As a result, there was no significant difference in the individual effect of each representative property. Also, the more accurate similarity comparison was possible when many representative properties were used.
https://doi.org/10.9717/kmms.2020.23.12.1552 인용 PDF KSCI HTML

Font Classification of English Printed Character using Non-negative Matrix Factorization (NMF를 이용한 영문자 활자체 폰트 분류)

Lee, Chang-Woo;Kang, Hyun;Jung, Kee-Chul;Kim, Hang-Joon
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.41 no.2
- /
- pp.65-76
- /
- 2004
Today, most documents are electronically produced and their paleography is digitalized by imaging, resulting in a tremendous number of electronic documents in the shape of images. Therefore, to process these document images, many methods of document structure analysis and recognition have already been proposed, including font classification. Accordingly, the current paper proposes a font classification method for document images that uses non-negative matrix factorization (NMF), which is able to learn part-based representations of objects. In the proposed method, spatially total features of font images are automatically extracted using NMF, then the appropriateness of the features specifying each font is investigated. The proposed method is expected to improve the performance of optical character recognition (OCR), document indexing, and retrieval systems, when such systems adopt a font classifier as a preprocessor.
PDF KSCI

Distinction of Korean and English Characters from Multi-font Images for the Recognition of Mixed Document Composed of Korean and English (한영 혼용문서 인식을 위한 다중 폰트 이미지로부터 한글과 영어의 구별)

전일수
- Journal of Korea Society of Industrial Information Systems
- /
- v.4 no.3
- /
- pp.52-58
- /
- 1999
This paper proposed and algorithm for distinguishing Korean and English characters which can be applied to multi-size and multi-font images The proposed algorithm distinguishes them as the ratio height to width of each character, the number of connected component, existing or not of stroke image on the left-upper area and detection of bars in an input image. The process of detecting bar is a sequence of left, upper, right, and lower. The proposed method was experimented and proved good performance for the Myungjo font, the Sinmyungjo font, the Gothic font, and the Kungseo font of Hanguel word processor which is widely used for the writing of documents.
PDF

Front Classification using Back Propagation Algorithm (오류 역전파 알고리즘을 이용한 영문자의 폰트 분류 방법에 관한 연구)

Jung Minchul
- Journal of Intelligence and Information Systems
- /
- v.10 no.2
- /
- pp.65-77
- /
- 2004
This paper presents a priori and the local font classification method. The font classification uses ascenders, descenders, and serifs extracted from a word image. The gradient features of those sub-images are extracted, and used as an input to a neural network classifier to produce font classification results. The font classification determines 2 font styles (upright or slant), 3 font groups (serif sans-serif or typewriter), and 7-font names (Postscript fonts such as Avant Garde, Helvetica, Bookman, New Century Schoolbook, Palatine, Times, and Courier). The proposed a priori and local font classification method allows an OCR system consisting of various font-specific character segmentation tools and various mono-font character recognizers. Experiments have shown font classification accuracies reach high performance levels of about 95.4 percent even with severely touching characters. The technique developed for tile selected 7 fonts in this paper can be applied to any other fonts.
PDF

MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition

Liu, Jingxin;Cheng, Jieren;Peng, Xin;Zhao, Zeli;Tang, Xiangyan;Sheng, Victor S.
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.6
- /
- pp.1833-1848
- /
- 2022
Named entity recognition (NER) is an important basic task in the field of Natural Language Processing (NLP). Recently deep learning approaches by extracting word segmentation or character features have been proved to be effective for Chinese Named Entity Recognition (CNER). However, since this method of extracting features only focuses on extracting some of the features, it lacks textual information mining from multiple perspectives and dimensions, resulting in the model not being able to fully capture semantic features. To tackle this problem, we propose a novel Multi-view Semantic Feature Fusion Model (MSFM). The proposed model mainly consists of two core components, that is, Multi-view Semantic Feature Fusion Embedding Module (MFEM) and Multi-head Self-Attention Mechanism Module (MSAM). Specifically, the MFEM extracts character features, word boundary features, radical features, and pinyin features of Chinese characters. The acquired font shape, font sound, and font meaning features are fused to enhance the semantic information of Chinese characters with different granularities. Moreover, the MSAM is used to capture the dependencies between characters in a multi-dimensional subspace to better understand the semantic features of the context. Extensive experimental results on four benchmark datasets show that our method improves the overall performance of the CNER model.
https://doi.org/10.3837/tiis.2022.06.004 인용 PDF KSCI HTML

Character Recognition Algorithm using Accumulation Mask

Yoo, Suk Won
- International Journal of Advanced Culture Technology
- /
- v.6 no.2
- /
- pp.123-128
- /
- 2018
Learning data is composed of 100 characters with 10 different fonts, and test data is composed of 10 characters with a new font that is not used for the learning data. In order to consider the variety of learning data with several different fonts, 10 learning masks are constructed by accumulating pixel values of same characters with 10 different fonts. This process eliminates minute difference of characters with different fonts. After finding maximum values of learning masks, test data is expanded by multiplying these maximum values to the test data. The algorithm calculates sum of differences of two corresponding pixel values of the expanded test data and the learning masks. The learning mask with the smallest value among these 10 calculated sums is selected as the result of the recognition process for the test data. The proposed algorithm can recognize various types of fonts, and the learning data can be modified easily by adding a new font. Also, the recognition process is easy to understand, and the algorithm makes satisfactory results for character recognition.
https://doi.org/10.17703/IJACT.2018.6.2.123 인용 PDF KSCI

Low-Quality Banknote Serial Number Recognition Based on Deep Neural Network

Jang, Unsoo;Suh, Kun Ha;Lee, Eui Chul
- Journal of Information Processing Systems
- /
- v.16 no.1
- /
- pp.224-237
- /
- 2020
Recognition of banknote serial number is one of the important functions for intelligent banknote counter implementation and can be used for various purposes. However, the previous character recognition method is limited to use due to the font type of the banknote serial number, the variation problem by the solid status, and the recognition speed issue. In this paper, we propose an aspect ratio based character region segmentation and a convolutional neural network (CNN) based banknote serial number recognition method. In order to detect the character region, the character area is determined based on the aspect ratio of each character in the serial number candidate area after the banknote area detection and de-skewing process is performed. Then, we designed and compared four types of CNN models and determined the best model for serial number recognition. Experimental results showed that the recognition accuracy of each character was 99.85%. In addition, it was confirmed that the recognition performance is improved as a result of performing data augmentation. The banknote used in the experiment is Indian rupee, which is badly soiled and the font of characters is unusual, therefore it can be regarded to have good performance. Recognition speed was also enough to run in real time on a device that counts 800 banknotes per minute.
https://doi.org/10.3745/JIPS.04.0160 인용 PDF KSCI

Improvement OCR Algorithm for Efficient Book Catalog RetrievalTechnology (효과적인 도서목록 검색을 위한 개선된 OCR알고리즘에 관한 연구)

HeWen, HeWen;Baek, Young-Hyun;Moon, Sung-Ryong
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.47 no.1
- /
- pp.152-159
- /
- 2010
Existing character recognition algorithm recognize characters in simple conditional. It has the disadvantage that recognition rates often drop drastically when input document image has low quality, rotated text, various font or size text because of external noise or data loss. In this paper, proposes the optical character recognition algorithm which using bicubic interpolation method for the catalog retrieval when the input image has rotated text, blurred, various font and size. In this paper, applied optical character recognition algorithm consist of detection and recognition part. Detection part applied roberts and hausdorff distance algorithm for correct detection the catalog of book. Recognition part applied bicubic interpolation to interpolate data loss due to low quality, various font and size text. By the next time, applied rotation for the bicubic interpolation result image to slant proofreading. Experimental results show that proposal method can effectively improve recognition rate 6% and search-time 1.077s process result.
PDF KSCI

Search Result 67, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)