Segmentation of region strings using connection-characteristic function

연결특성함수를 이용한 문서화상에서의 영역 분리와 문자열 추출

  • Published : 1997.11.01

Abstract

This paper describes a method for region segmentation and string extractionin documents which are mixed with text, graphic and picture images by the use of the structural characteristic of connceted components. In segmentation of non-text regionas, with connection-characteristic functions which are made by structural characteristic of connected components, segmentation process is progressed. In the string extraction, first we organize basic-unit-region of which vertical and horizontal length are 1/4 of average length of connection components. Second, by merging the basic-unit-regions one other that have smaller values than a given connection intensity threshold. Third, by linking the word blocks with similar block anagles, initial strings are cresed. Finally the whold strings are generated by merging remaining word blocks whose angles are not decided, if their height and prosition are similar to the initial strings. This method can extract strings that are neither horizontal nor of various character sizes. Through computer exteriments with different style documents, we have shown that the feasibility of our method successes.

Keywords

References

  1. 문자인식: 이론과 실제 이정한
  2. 패턴인식이해의 새로운 전개 H. Ogawa
  3. Proc. of the IEEE v.80 no.7 Major compoments of a complete text reading system S. Tsujimoto;H. Asada
  4. IBM, J. Res. Develope v.26 no.8 Document analysis system K. Y. Wong(et al)
  5. IEEE Computer, Special issue on Document Image Analysis System A prototype document image analysis system for technical journal G. nagy;S. Seth;M. Viswanathan
  6. 정보과학회 논문(B) v.22 no.11 명암 문자열 영상의 지형적 특징을 이용한 비선형 문자 분할 및 인식 이동준;이성환
  7. Computer Vision Graphics And Image Processing v.47 Classification of newspaper image block using texture analysis D. Wang;S. N. Srihari
  8. Processing of 10th International Conference on Pattern Recognition A rule-based system for document image segmentation Fisher, J. L.;Hinds S. C.;D'Amato D. P.
  9. 정보과학회논문지 v.21 no.5 일반적인 문서화상의 영역식별법 박영석
  10. 한국통신학회논문지 v.8 no.1 Morphology를 이용한 문서화상내의 문자열 추출에 관한 연구 장희돈;김석태;남궁재찬
  11. IEEE Trans. Patt. Anal. And Mach. Intell. v.10 no.6 A robust algorithm for text string separation from mixed text/graphics images L. A. Fletcher;R. Kasturi
  12. International Conference on Document analysis and Recognition(ICDAR) Page segmentation by white streams T. Pavlidis;J. Zhou
  13. 3th Proc. International Conference on Document Analysis and Recognition(ICDAR) Column segmentation by White Space Pattern Matching M. Ozaki;P. Alto
  14. 한국정보처리학회 논문지 v.3 no.2 코스트 최소화법에 의한 문자영역의 추출 김석태