DOI QR코드

DOI QR Code

Deep-Learning Approach for Text Detection Using Fully Convolutional Networks

  • Received : 2017.08.21
  • Accepted : 2018.03.15
  • Published : 2018.03.28

Abstract

Text, as one of the most influential inventions of humanity, has played an important role in human life since ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications such as the text data extracted from images that can provide information for automatic annotation, indexing, language translation, and the assistance systems for impaired persons. Therefore, natural-scene text detection with active research topics regarding computer vision and document analysis is very important. Previous methods have poor performances due to numerous false-positive and true-negative regions. In this paper, a fully-convolutional-network (FCN)-based method that uses supervised architecture is used to localize textual regions. The model was trained directly using images wherein pixel values were used as inputs and binary ground truth was used as label. The method was evaluated using ICDAR-2013 dataset and proved to be comparable to other feature-based methods. It could expedite research on text detection using deep-learning based approach in the future.

Keywords

E1CTBR_2018_v14n1_1_f0001.png 이미지

Fig. 1. Examples of the Text in Natural-scene Images [3].

E1CTBR_2018_v14n1_1_f0002.png 이미지

Fig. 2. Transforming fully connected layers into convolutionlayers enables a classi?cation net to output a heat map. Addinglayers and a spatial loss produces an ef?cient machine for end-to-end dense learning [26].

E1CTBR_2018_v14n1_1_f0003.png 이미지

Fig. 3. FCN architecture used in the proposed method

E1CTBR_2018_v14n1_1_f0004.png 이미지

Fig. 4. Example of Feature-map Generation from the Usage of the Supervised Fully Convolutional Network (FCN).(a) Input image, (b)-(e) feature maps from stages 1-4, and (f) feature maps for stage 5 and the final salient map

E1CTBR_2018_v14n1_1_f0005.png 이미지

Fig. 5. Illustration of the ground-truth map

E1CTBR_2018_v14n1_1_f0006.png 이미지

Fig. 6. Detection Examples of the Proposed Method

Table 1. Performance of Different Algorithms Evaluated on the International Conference on Document Analysis and Recognition (ICDAR)- 2013 Dataset.

E1CTBR_2018_v14n1_1_t0001.png 이미지

References

  1. C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Rotation-invariant features for multi-oriented text detection in natural images," PLoS One, vol. 8, no. 8, 2013.
  2. C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets, "A learned multi-scale representation for scene text recognition," CVPR, 2014.
  3. ICDAR 2013 robust reading competition, http://dag.cvc.uab.es/icdar2013competition, 2014.
  4. X. Chen and A. Yuille, "Detecting and reading text in natural scenes," CVPR, 2004.
  5. Y. Zhong, K. Karu, and A. K. Jain, "Locating text in complex color images," Pattern Recognition, vol. 28, no. 10, 1995, pp. 1523-1535. https://doi.org/10.1016/0031-3203(95)00030-4
  6. K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," IEEE Trans. PAMI, vol. 25, no. 12, 2003, pp.1631-1639. https://doi.org/10.1109/TPAMI.2003.1251157
  7. J. Gllavata, R. Ewerth, and B. Freisleben, "Text detection in images based on unsupervised classification of high-frequency wavelet coefficients," ICPR, 2004.
  8. B. Leibe and B. Schiele, "Scale-invariant object categorization using a scale-adaptive mean-shift search," Pattern Recognition, 2004, pp. 145-153.
  9. M. R. Lyu, J. Song, and M. Cai, "A comprehensive method for multilingual video text detection, localization, and extraction," IEEE Trans. CSVT, vol. 15, no. 2, 2005, pp. 243-255.
  10. Y. Zhong, H. Zhang, and A. K. Jain, "Automatic caption localization in compressed video," IEEE Trans. PAMI, vol. 22, no. 4, 2000, pp. 385-392. https://doi.org/10.1109/34.845381
  11. P. Viola and M. Jones, "Fast and robust classification using asymmetric adaboost and a detector cascade," Proc. of NIPS, 2001.
  12. V. Wu, R. Manmatha, and E. M. Riseman, "Finding text in images," ACM Int. Conf. Digital Libraries, 1997.
  13. C. Wolf and J. M. Jolion, "Extraction and recognition of artificial text in multimedia documents," Formal Pattern Analysis and Applications, vol. 6, no. 4, 2004, pp. 309-326. https://doi.org/10.1007/s10044-003-0197-7
  14. K. Wang and S. Belongie, "Word spotting in the wild," ECCV, 2010.
  15. B. Epshtein, E. Ofek, and Y. Wexler, "Detecting text in natural scenes with stroke width transform," CVPR, 2010.
  16. C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," CVPR, 2012.
  17. C. Yi and Y. Tian, "Text string detection from natural scenes by structure-based partition and grouping," IEEE Trans. Image Processing, vol. 20, no. 9, 2011, pp. 2594-2605. https://doi.org/10.1109/TIP.2011.2126586
  18. W. Huang, Z. Lin, J. Yang, and J. Wang, "Text localization in natural images using stroke feature transform and text covariance descriptors," ICCV, 2013.
  19. A. Jain and B. Yu, "Automatic text location in images and video frames," Pattern Recognition, vol. 31, no. 12, 1998, pp. 2055-2076. https://doi.org/10.1016/S0031-3203(98)00067-3
  20. L. Neumann and J. Matas, "A method for text localization and recognition in real-world images," ACCV, 2010.
  21. J. Wright, A. Y. Yang, and A. Ganesh, "Robust face recognition via sparse representation," IEEE Trans. PAMI, vol. 31, no. 2, 2009, pp. 210-227. https://doi.org/10.1109/TPAMI.2008.79
  22. M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Trans. Image Processing, vol. 15, no. 12, 2006, pp. 3736-3745. https://doi.org/10.1109/TIP.2006.881969
  23. M. Zhao, S. Li, and J. Kwok, "Text detection in images using sparse representation with discriminative dictionaries," Image and Vision Computing, vol. 28, no. 12, 2010, pp. 1590-1599. https://doi.org/10.1016/j.imavis.2010.04.002
  24. P. Shivakumara, T. Q. Phan, and C. L. Tan, "A laplacian approach to multi-oriented text detection in video," IEEE Trans. PAMI, vol. 33, no. 2, 2011, pp. 412-419. https://doi.org/10.1109/TPAMI.2010.166
  25. Y. Pan, X. Hou, and C. Liu, "A hybrid approach to detect and localize texts in natural scene images," IEEE Trans. Image Processing, vol. 20, no. 3, 2011, pp. 800-813. https://doi.org/10.1109/TIP.2010.2070803
  26. J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," CVPR, 2015.
  27. W. Huang, Y. Qiao, and X. Tang, "Robust scene text detection with convolution neural network induced mser trees," ECCV, 2014.
  28. Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, "Multi-oriented text detection with fully convolutional networks," CVPR, 2016.
  29. S. Xie and Z. Tu, "Holistically-Nested Edge Detection," ICCV, 2015.
  30. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," ICLR, 2015.
  31. X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao, "Multi-orientation scene text detection with adaptive clustering," IEEE Trans. on PAMI, vol. 37, no. 9, Jan. 2015, pp. 1930-1937. https://doi.org/10.1109/TPAMI.2014.2388210
  32. A. Zamberletti, L. Noce, and I. Gallo, "Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions," ACCV workshop, 2014.
  33. Z. Zhang, W, Shen, C. Yao, and X. Bai, "Symmetrybased text line detection in natural scenes," CVPR, 2015.