Fig. 1. Examples of the Text in Natural-scene Images [3].
Fig. 2. Transforming fully connected layers into convolutionlayers enables a classi?cation net to output a heat map. Addinglayers and a spatial loss produces an ef?cient machine for end-to-end dense learning [26].
Fig. 3. FCN architecture used in the proposed method
Fig. 4. Example of Feature-map Generation from the Usage of the Supervised Fully Convolutional Network (FCN).(a) Input image, (b)-(e) feature maps from stages 1-4, and (f) feature maps for stage 5 and the final salient map
Fig. 5. Illustration of the ground-truth map
Fig. 6. Detection Examples of the Proposed Method
Table 1. Performance of Different Algorithms Evaluated on the International Conference on Document Analysis and Recognition (ICDAR)- 2013 Dataset.
References
- C. Yao, X. Zhang, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Rotation-invariant features for multi-oriented text detection in natural images," PLoS One, vol. 8, no. 8, 2013.
- C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets, "A learned multi-scale representation for scene text recognition," CVPR, 2014.
- ICDAR 2013 robust reading competition, http://dag.cvc.uab.es/icdar2013competition, 2014.
- X. Chen and A. Yuille, "Detecting and reading text in natural scenes," CVPR, 2004.
- Y. Zhong, K. Karu, and A. K. Jain, "Locating text in complex color images," Pattern Recognition, vol. 28, no. 10, 1995, pp. 1523-1535. https://doi.org/10.1016/0031-3203(95)00030-4
- K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," IEEE Trans. PAMI, vol. 25, no. 12, 2003, pp.1631-1639. https://doi.org/10.1109/TPAMI.2003.1251157
- J. Gllavata, R. Ewerth, and B. Freisleben, "Text detection in images based on unsupervised classification of high-frequency wavelet coefficients," ICPR, 2004.
- B. Leibe and B. Schiele, "Scale-invariant object categorization using a scale-adaptive mean-shift search," Pattern Recognition, 2004, pp. 145-153.
- M. R. Lyu, J. Song, and M. Cai, "A comprehensive method for multilingual video text detection, localization, and extraction," IEEE Trans. CSVT, vol. 15, no. 2, 2005, pp. 243-255.
- Y. Zhong, H. Zhang, and A. K. Jain, "Automatic caption localization in compressed video," IEEE Trans. PAMI, vol. 22, no. 4, 2000, pp. 385-392. https://doi.org/10.1109/34.845381
- P. Viola and M. Jones, "Fast and robust classification using asymmetric adaboost and a detector cascade," Proc. of NIPS, 2001.
- V. Wu, R. Manmatha, and E. M. Riseman, "Finding text in images," ACM Int. Conf. Digital Libraries, 1997.
- C. Wolf and J. M. Jolion, "Extraction and recognition of artificial text in multimedia documents," Formal Pattern Analysis and Applications, vol. 6, no. 4, 2004, pp. 309-326. https://doi.org/10.1007/s10044-003-0197-7
- K. Wang and S. Belongie, "Word spotting in the wild," ECCV, 2010.
- B. Epshtein, E. Ofek, and Y. Wexler, "Detecting text in natural scenes with stroke width transform," CVPR, 2010.
- C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, "Detecting texts of arbitrary orientations in natural images," CVPR, 2012.
- C. Yi and Y. Tian, "Text string detection from natural scenes by structure-based partition and grouping," IEEE Trans. Image Processing, vol. 20, no. 9, 2011, pp. 2594-2605. https://doi.org/10.1109/TIP.2011.2126586
- W. Huang, Z. Lin, J. Yang, and J. Wang, "Text localization in natural images using stroke feature transform and text covariance descriptors," ICCV, 2013.
- A. Jain and B. Yu, "Automatic text location in images and video frames," Pattern Recognition, vol. 31, no. 12, 1998, pp. 2055-2076. https://doi.org/10.1016/S0031-3203(98)00067-3
- L. Neumann and J. Matas, "A method for text localization and recognition in real-world images," ACCV, 2010.
- J. Wright, A. Y. Yang, and A. Ganesh, "Robust face recognition via sparse representation," IEEE Trans. PAMI, vol. 31, no. 2, 2009, pp. 210-227. https://doi.org/10.1109/TPAMI.2008.79
- M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Trans. Image Processing, vol. 15, no. 12, 2006, pp. 3736-3745. https://doi.org/10.1109/TIP.2006.881969
- M. Zhao, S. Li, and J. Kwok, "Text detection in images using sparse representation with discriminative dictionaries," Image and Vision Computing, vol. 28, no. 12, 2010, pp. 1590-1599. https://doi.org/10.1016/j.imavis.2010.04.002
- P. Shivakumara, T. Q. Phan, and C. L. Tan, "A laplacian approach to multi-oriented text detection in video," IEEE Trans. PAMI, vol. 33, no. 2, 2011, pp. 412-419. https://doi.org/10.1109/TPAMI.2010.166
- Y. Pan, X. Hou, and C. Liu, "A hybrid approach to detect and localize texts in natural scene images," IEEE Trans. Image Processing, vol. 20, no. 3, 2011, pp. 800-813. https://doi.org/10.1109/TIP.2010.2070803
- J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," CVPR, 2015.
- W. Huang, Y. Qiao, and X. Tang, "Robust scene text detection with convolution neural network induced mser trees," ECCV, 2014.
- Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai, "Multi-oriented text detection with fully convolutional networks," CVPR, 2016.
- S. Xie and Z. Tu, "Holistically-Nested Edge Detection," ICCV, 2015.
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," ICLR, 2015.
- X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao, "Multi-orientation scene text detection with adaptive clustering," IEEE Trans. on PAMI, vol. 37, no. 9, Jan. 2015, pp. 1930-1937. https://doi.org/10.1109/TPAMI.2014.2388210
- A. Zamberletti, L. Noce, and I. Gallo, "Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions," ACCV workshop, 2014.
- Z. Zhang, W, Shen, C. Yao, and X. Bai, "Symmetrybased text line detection in natural scenes," CVPR, 2015.