Locating Text in Web Images Using Image Based Approaches

웹 이미지로부터 이미지기반 문자추출

  • Published : 2002.06.01

Abstract

A locating text technique capable of locating and extracting text blocks in various Web images is presented here. Until now this area of work has been ignored by researchers even if this sort of text may be meaningful for internet users. The algorithms associated with the technique work without prior knowledge of the text orientation, size or font. In the work presented in this research, our text extraction algorithm utilizes useful edge detection followed by histogram analysis on the genuine characteristics of letters defined by text clustering region, to properly perform extraction of the text region that does not depend on font styles and sizes. By a number of experiments we have showed impressively acceptable results.

본 논문은 다양한 웹 이미지로부터 문자영역(text block)의 위치를 알아내고 문자영역을 추출하는 방법을 제안한다. 인터넷 사용자관점에서 볼 때, 웹 이미지에 포함되어 있는 문자정보는 중요한 정보이지만 최근까지 이 분야의 연구는 그리 활발하지 못했다. 본 연구에서 제안된 알고리즘은 문자의 경사방향(skew)과 문자의 크기나 폰트에 관한 사전 정보 없이 수행되어 질 수 있도록 제안되었다 폰트 스타일과 크기에 제약되지 않고 문자영역을 적합하게 추출하기 위해 유용한 에지 검출, 문자 클러스터링 영역으로 정의되는 문자의 고유한 특성을 위한 히스토그램을 사용하였다. 다수의 실험을 통하여 제안된 방법을 테스트하고 수용할 만한 결과를 도출했다.

Keywords

References

  1. IEEE Proceedings Block Segmentation and Text Area Extraction of Vertically/Horizontally Written Document Amamoto,N.;Torigoe,S.;Hirogaki,Yoshitaka
  2. Computer Vision and Image Understanding v.70 no.3 Page Segmentation Using the Description of the Background Antonacopoulos, Apostolos
  3. Computer Vision and Image Understanding v.70 no.3 Summarization of Imaged Documents without OCR Chen, Francine R.;Bloomberg, Dan S.
  4. A Survey Computer Vision and Image Understanding v.70 no.3 Indexing and Retrieval of Document Images Doermann, David
  5. IEEE Transaction on Pattern Analysis and Machine Intelligence v.19 no.1 Multiscale Segmentation of unstructured Document Pages Using Soft Decision Integration Etemad,K.;Doermann, David;Chellappa, Rama
  6. Computer and Robot Vision v.1 Haralick, Robert M.;Shapiro, Linda G.
  7. Computer and Robot Vision v.2 Haralick, Robert M.;Shapiro, Linda G.
  8. IEEE Transaction on Pattern Analysis and Machine Intelligence v.20 no.3 Document representation and its Application to Page Decomposition Jain, Anil K.;Yu, Bin
  9. Machine Vision Jain, Ramesh;Kasturi, Rangachar;Schunck, Brian G.
  10. Computer Vision and Image Understanding v.70 no.3 Segmentation of Page Images Using the Area Voronoi Diagram Kise,K.;Sato, Akinori;Iwata, Motoi
  11. IEEE Transaction on Pattern Analysis and Machine Intelligence v.18 no.10 A New Methodology for Gray-Scale Character Segmentation and Recognition Lee, Seong-Whan;Lee, Dong-June;Par, Hee-Seon
  12. IEEE Transactions on Pattern Analysis and Machine Intelligence v.20 no.4 Detection of Text Regions from Digital Engineering Drawing Lu, Zhaoyang
  13. Document Image Analysis O'Gorman, Lawrence;Kasturi, Rangachar
  14. Proc. of the 5th Multiconference on Systemics v.XIV Text Localization in WWW images Okun,O.;Pietikainen, Matti
  15. Proc. of IAPR Workshop on Machine Vision Applications Orientation and Scale Invariant Text Region Extraction in WWW images Park,T.;Kim,D.;Chung
  16. Proc. of the 42nd Midwest Symposium on Circuits and systems Lolmogorov Complexity-Based Ideas for Locating Text in Web Images Schmidt,M.;Kreinovich,V.;Longpre,L.
  17. IEEE Transaction on Pattern Analysis and Machine Intelligence v.19 no.3 A Fast Algorithm for Bottom-Up Document Layout Analysis Slimon,A.;Pret, Jean-Christophe;Johnson, A Peter
  18. Pattern Recognition v.31 no.1 Text Extraction Using Pyramid TAN,C.L.;NG,P.O.