Bilingual document analysis and character segmentation using connected components

연결요소를 이용한 한.영 혼용문서의 구조분석 및 낱자분리

  • 김민기 (중앙대학교 컴퓨터공학과) ;
  • 권영빈 (중앙대학교 컴퓨터공학과) ;
  • 한상용 (중앙대학교 컴퓨터공학과)
  • Published : 1997.03.01

Abstract

In this paper, we descried a bottom-up document structure analysis method in bilingual Korean-English document. We proposed a character segmentation method based on the layout information of connected component of each character. In many researches, a document has been analyzed into text blocks and graphics. We analyzed a document into four parts: text, table, graphic, and separator. A text is recursively subdivided into text blocks, text lines, words, and characters. To extract the character in bilingual text, we proposed a new method of word of word separation of Korean or English. Futhermore, we used a character merging and segmentation method in accordance with the properties of Hangul on the Korean word blocks. Experimental results on the various documents show that the proposed method is very effectively operated on the document structure analysis and the character segmentation.

Keywords

References

  1. Proc.of the IEEE v.80 no.7 Historical Review of OCR Research adn Development S. Mori;C. Y. Suen;K. Yamamoto
  2. Proc.of the ICDAR93 Perfect Metrics Tin Kam Ho;H. S. Baird
  3. IEEE tran. on PAMI v.9 no.2 On the Recognition of Printed Characters of Any Font and Size S. Kahan;T. Pavlidis;H. S. Baird
  4. 정보과학회논문집 v.20 no.12 연결화소를 이용한 문서 영상의 분할 및 인식 장명욱;천대녕;양현승
  5. Proc. of the ICDAR93 Document Structures:A Survey Y.Y. Tang;C.Y. Suen
  6. Proc.of the ICDAR95 Realization of A High-Performance Bilingual Chinese-English OCR System Hong Guo (et al.)
  7. Proc. of the ICDAR93 Initial Learning of Document Structure A. Dengel
  8. CVGIP v.47 Classification of News-paper Image Block Using Texture Analysis D. Wang;S. N. Srihari
  9. IEEE Trans. on Pattern Analysis and Machine Intelligence v.10 no.6 A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images L.A. Fletcher;R. Kasturi
  10. CVGIP Block Segmentation and Text Extraction in Mixed Text/Image Documents F.M. Wahl;K.Y.Wong;R.G.Gasey
  11. Proc.of the ICDAR93 A Block Segmentation Method for Document Image with Complicated Column Structures Y. Hirayama
  12. 1994년도 한국정보과학회 가을 학술발표논문집 v.21 no.2 한글 및 영숫자 혼용 문서에서의 문자분할 및 인식 이동준;이성환
  13. Proc. of IEEE v.80 no.7 Document Analysis-From Pixels to Contents J. Schurmann (et al.)
  14. Proc. of the IEEE v.80 no.7 Segmentation Methods for Character Recognition:From Segmentation to Document Structure Analysis H.Fujisawa;Y.Nakano;K.Kurino
  15. 정보과학회논문지 v.21 no.1 문자영역 추출과정에서의 오본리의 교정 최봉희;이인동;김태균
  16. 1989년도 한글 및 한국어정보처리 학술발표논문집 신문 자동인식 시스템을 위한 문자의 분류에 관한 연구 이승형;전종익;조용주;남궁재찬
  17. Proc. of IEEE v.80 no.7 Major Components of A Complete Text Reading System S. Tsujimoto;H. Asada
  18. 제2회 문자인식 워크샵 다양한 결합문자를 갖는 계층지도의 인식 박문규;권영빈
  19. 제2회 문자인식 워크샵 인쇄체 문서인식을 위한 문자추출에 관한 연구 김의정;김태균