DOI QR코드

DOI QR Code

Typographical Analyses and Classes of Characters and Words in Optical Character Recognition

문자 인식에서 단어 간의 활자 인쇄선 위치 분석과 클래스 분류

  • 정민철 (상명대학교 컴퓨터시스템공학과)
  • Published : 2005.06.01

Abstract

This paper presents a typographical analyses and classes. Typographical analysis is an indispensable tool for machine-printed character recognition in English. This analysis is a preliminary step for character segmentation in OCR(Optical Character Recognition). This paper is divided into two parts. In the first part, word typographical classes from words are defined by the word typographical analysis. In the second part, character typographical classes from connected components are defined by the character typographical analysis. The character typographical classes are used in the character segmentation.

본 논문은 활자 인쇄선 분석과 이에 따른 클래스 분류를 제안한다. 활자 인쇄선 분석은 영문 인쇄체 인식에 있어 불가결한 요소이다. 활자 인쇄선 분석은 문자 인식에서 문자 분할을 위한 전처리 단계이다. 본 논문은 두 부분으로 나뉘는데, 첫 부분에서는 단어 간 활자 인쇄선 분석을 통한 단어 활자선 클래스를 정의한다. 두 번째 부분에서는 문자 간 활자 인쇄선 분석을 통한 문자 활자선 클래스를 정의한다. 이렇게 정의된 단어 활자선 클래스와 문자 활자선 클래스는 문자 분할시 정확한 문자 분할을 위하여 사용된다.

Keywords

References

  1. M. Bokser, 'Omnidocument technologies,' Proceedings of the IEEE, Vol.80, No.7, pp.1066-1078, 1992 https://doi.org/10.1109/5.156470
  2. Takehiro N. and A. Lawrence Spitz, 'European Language Determination from Image,' 2nd International Conference on Document Analysis and Recognition, 1993 https://doi.org/10.1109/ICDAR.1993.395759
  3. 이응주, '수직수평 투영 및 복합패턴벡터를 이용한 한.영 글꼴 문자인식(Korean.English Font Character Recognition Using Vertical/Horizontal Projection and Hybrid Pattern Vector),' 한국화상학회지, Vol.8 No.2, 2002
  4. Lu Da, Pu Wei and Brendan McCane, 'Character Pre-classification Based on Fuzzy Typographical Analysis,' 6th International Conference on Document Analysis and Recognition, 2001 https://doi.org/10.1109/ICDAR.2001.953758
  5. O. D. Trier and A. K. Jain and T. Taxt, 'Feature extraction methods for character recognition a survey,' Pattern Recognition, Vol.29, No.4, pp.641-662, 1996 https://doi.org/10.1016/0031-3203(95)00118-2
  6. M.K. Kim and Y.B. Kwon, 'Multi-font and multi-size character recognition based on the sampling and quantization of an unwrapped contour,' International Conference on Pattern Recognition, pp.170-174, 1996 https://doi.org/10.1109/ICPR.1996.546816
  7. S. Mori, C.Y. Suen and K. Yamamoto, 'Historical review of OCR research and development,' Proceedings of the IEEE, Vol.80, No.7, pp.1029-1058, 1992 https://doi.org/10.1109/5.156468
  8. T. Pavlidis, 'Algorithms for Graphics and Image Processing,' Computer Science Press, 1982
  9. S. Liang and M. Ahmadi and M. Shridhar, 'Segmentation of Touching Characters in Printed Document Recognition,' 2nd International Conference on Document Analysis and Recognition, pp.569-572, 1993 https://doi.org/10.1109/ICDAR.1993.395671
  10. S. Liang, M. Shridhar and M. Ahmadi, 'Segmentation of touching characters in printed document recognition,' Pattern Recognition, Vol.27, No.6, pp.825-840, 1994 https://doi.org/10.1016/0031-3203(94)90167-8
  11. J. Wang and J. Jean, 'Resolving multifont character confusion with neural networks,' Pattern Recognition, Vol.26, No.1, pp.175-187, 1993 https://doi.org/10.1016/0031-3203(93)90099-I