• Title/Summary/Keyword: Text Recognition

Search Result 677, Processing Time 0.027 seconds

Implementation of Pen-Gesture Recognition System for Multimodal User Interface (멀티모달 사용자 인터페이스를 위한 펜 제스처인식기의 구현)

  • 오준택;이우범;김욱현
    • Proceedings of the IEEK Conference
    • /
    • 2000.11c
    • /
    • pp.121-124
    • /
    • 2000
  • In this paper, we propose a pen gesture recognition system for user interface in multimedia terminal which requires fast processing time and high recognition rate. It is realtime and interaction system between graphic and text module. Text editing in recognition system is performed by pen gesture in graphic module or direct editing in text module, and has all 14 editing functions. The pen gesture recognition is performed by searching classification features that extracted from input strokes at pen gesture model. The pen gesture model has been constructed by classification features, ie, cross number, direction change, direction code number, position relation, distance ratio information about defined 15 types. The proposed recognition system has obtained 98% correct recognition rate and 30msec average processing time in a recognition experiment.

  • PDF

Research on Korea Text Recognition in Images Using Deep Learning (딥 러닝 기법을 활용한 이미지 내 한글 텍스트 인식에 관한 연구)

  • Sung, Sang-Ha;Lee, Kang-Bae;Park, Sung-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2020
  • In this study, research on character recognition, which is one of the fields of computer vision, was conducted. Optical character recognition, which is one of the most widely used character recognition techniques, suffers from decreasing recognition rate if the recognition target deviates from a certain standard and format. Hence, this study aimed to address this limitation by applying deep learning techniques to character recognition. In addition, as most character recognition studies have been limited to English or number recognition, the recognition range has been expanded through additional data training on Korean text. As a result, this study derived a deep learning-based character recognition algorithm for Korean text recognition. The algorithm obtained a score of 0.841 on the 1-NED evaluation method, which is a similar result to that of English recognition. Further, based on the analysis of the results, major issues with Korean text recognition and possible future study tasks are introduced.

Correction of Signboard Distortion by Vertical Stroke Estimation

  • Lim, Jun Sik;Na, In Seop;Kim, Soo Hyung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.9
    • /
    • pp.2312-2325
    • /
    • 2013
  • In this paper, we propose a preprocessing method that it is to correct the distortion of text area in Korean signboard images as a preprocessing step to improve character recognition. Distorted perspective in recognizing of Korean signboard text may cause of the low recognition rate. The proposed method consists of four main steps and eight sub-steps: main step consists of potential vertical components detection, vertical components detection, text-boundary estimation and distortion correction. First, potential vertical line components detection consists of four steps, including edge detection for each connected component, pixel distance normalization in the edge, dominant-point detection in the edge and removal of horizontal components. Second, vertical line components detection is composed of removal of diagonal components and extraction of vertical line components. Third, the outline estimation step is composed of the left and right boundary line detection. Finally, distortion of the text image is corrected by bilinear transformation based on the estimated outline. We compared the changes in recognition rates of OCR before and after applying the proposed algorithm. The recognition rate of the distortion corrected signboard images is 29.63% and 21.9% higher at the character and the text unit than those of the original images.

Development an Android based OCR Application for Hangul Food Menu (한글 음식 메뉴 인식을 위한 OCR 기반 어플리케이션 개발)

  • Lee, Gyu-Cheol;Yoo, Jisang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.5
    • /
    • pp.951-959
    • /
    • 2017
  • In this paper, we design and implement an Android-based Hangul food menu recognition application that recognizes characters from images captured by a smart phone. Optical Character Recognition (OCR) technology is divided into preprocessing, recognition and post-processing. In the preprocessing process, the characters are extracted using Maximally Stable Extremal Regions (MSER). In recognition process, Tesseract-OCR, a free OCR engine, is used to recognize characters. In the post-processing process, the wrong result is corrected by using the dictionary DB for the food menu. In order to evaluate the performance of the proposed method, experiments were conducted to compare the recognition performance using the actual menu plate as the DB. The recognition rate measurement experiment with OCR Instantly Free, Text Scanner and Text Fairy, which is a character recognizing application in Google Play Store, was conducted. The experimental results show that the proposed method shows an average recognition rate of 14.1% higher than other techniques.

Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion

  • Xinhua Lu;Haihai Wei;Li Ma;Qingji Xue;Yonghui Fu
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.427-438
    • /
    • 2023
  • Plenty of works have indicated that single image super-resolution (SISR) models relying on synthetic datasets are difficult to be applied to real scene text image super-resolution (STISR) for its more complex degradation. The up-to-date dataset for realistic STISR is called TextZoom, while the current methods trained on this dataset have not considered the effect of multi-scale features of text images. In this paper, a multi-scale and attention fusion model for realistic STISR is proposed. The multi-scale learning mechanism is introduced to acquire sophisticated feature representations of text images; The spatial and channel attentions are introduced to capture the local information and inter-channel interaction information of text images; At last, this paper designs a multi-scale residual attention module by skillfully fusing multi-scale learning and attention mechanisms. The experiments on TextZoom demonstrate that the model proposed increases scene text recognition's (ASTER) average recognition accuracy by 1.2% compared to text super-resolution network.

Speaker Identification using Phonetic GMM (음소별 GMM을 이용한 화자식별)

  • Kwon Sukbong;Kim Hoi-Rin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.185-188
    • /
    • 2003
  • In this paper, we construct phonetic GMM for text-independent speaker identification system. The basic idea is to combine of the advantages of baseline GMM and HMM. GMM is more proper for text-independent speaker identification system. In text-dependent system, HMM do work better. Phonetic GMM represents more sophistgate text-dependent speaker model based on text-independent speaker model. In speaker identification system, phonetic GMM using HMM-based speaker-independent phoneme recognition results in better performance than baseline GMM. In addition to the method, N-best recognition algorithm used to decrease the computation complexity and to be applicable to new speakers.

  • PDF

TextNAS Application to Multivariate Time Series Data and Hand Gesture Recognition (textNAS의 다변수 시계열 데이터로의 적용 및 손동작 인식)

  • Kim, Gi-duk;Kim, Mi-sook;Lee, Hack-man
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.518-520
    • /
    • 2021
  • In this paper, we propose a hand gesture recognition method by modifying the textNAS used for text classification so that it can be applied to multivariate time series data. It can be applied to various fields such as behavior recognition, emotion recognition, and hand gesture recognition through multivariate time series data classification. In addition, it automatically finds a deep learning model suitable for classification through training, thereby reducing the burden on users and obtaining high-performance class classification accuracy. By applying the proposed method to the DHG-14/28 and Shrec'17 datasets, which are hand gesture recognition datasets, it was possible to obtain higher class classification accuracy than the existing models. The classification accuracy was 98.72% and 98.16% for DHG-14/28, and 97.82% and 98.39% for Shrec'17 14 class/28 class.

  • PDF

Scene Text Recognition Performance Improvement through an Add-on of an OCR based Classifier (OCR 엔진 기반 분류기 애드온 결합을 통한 이미지 내부 텍스트 인식 성능 향상)

  • Chae, Ho-Yeol;Seok, Ho-Sik
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1086-1092
    • /
    • 2020
  • An autonomous agent for real world should be able to recognize text in scenes. With the advancement of deep learning, various DNN models have been utilized for transformation, feature extraction, and predictions. However, the existing state-of-the art STR (Scene Text Recognition) engines do not achieve the performance required for real world applications. In this paper, we introduce a performance-improvement method through an add-on composed of an OCR (Optical Character Recognition) engine and a classifier for STR engines. On instances from IC13 and IC15 datasets which a STR engine failed to recognize, our method recognizes 10.92% of unrecognized characters.

Weibo Disaster Rumor Recognition Method Based on Adversarial Training and Stacked Structure

  • Diao, Lei;Tang, Zhan;Guo, Xuchao;Bai, Zhao;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3211-3229
    • /
    • 2022
  • To solve the problems existing in the process of Weibo disaster rumor recognition, such as lack of corpus, poor text standardization, difficult to learn semantic information, and simple semantic features of disaster rumor text, this paper takes Sina Weibo as the data source, constructs a dataset for Weibo disaster rumor recognition, and proposes a deep learning model BERT_AT_Stacked LSTM for Weibo disaster rumor recognition. First, add adversarial disturbance to the embedding vector of each word to generate adversarial samples to enhance the features of rumor text, and carry out adversarial training to solve the problem that the text features of disaster rumors are relatively single. Second, the BERT part obtains the word-level semantic information of each Weibo text and generates a hidden vector containing sentence-level feature information. Finally, the hidden complex semantic information of poorly-regulated Weibo texts is learned using a Stacked Long Short-Term Memory (Stacked LSTM) structure. The experimental results show that, compared with other comparative models, the model in this paper has more advantages in recognizing disaster rumors on Weibo, with an F1_Socre of 97.48%, and has been tested on an open general domain dataset, with an F1_Score of 94.59%, indicating that the model has better generalization.

Action recognition, hand gesture recognition, and emotion recognition using text classification method (Text classification 방법을 사용한 행동 인식, 손동작 인식 및 감정 인식)

  • Kim, Gi-Duk
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.01a
    • /
    • pp.213-216
    • /
    • 2021
  • 본 논문에서는 Text Classification에 사용된 딥러닝 모델을 적용하여 행동 인식, 손동작 인식 및 감정 인식 방법을 제안한다. 먼저 라이브러리를 사용하여 영상에서 특징 추출 후 식을 적용하여 특징의 벡터를 저장한다. 이를 Conv1D, Transformer, GRU를 결합한 모델에 학습시킨다. 이 방법을 통해 하나의 딥러닝 모델을 사용하여 다양한 분야에 적용할 수 있다. 제안한 방법을 사용해 SYSU 3D HOI 데이터셋에서 99.66%, eNTERFACE' 05 데이터셋에 대해 99.0%, DHG-14 데이터셋에 대해 95.48%의 클래스 분류 정확도를 얻을 수 있었다.

  • PDF