Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion

Xinhua Lu;Haihai Wei;Li Ma;Qingji Xue;Yonghui Fu;

doi:10.3745/JIPS.02.0199

Journal of Information Processing Systems

Volume 19 Issue 4
/
Pages.427-438
/
2023
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion

Xinhua Lu (School of Information Engineering, Nanyang Institute of Technology) ;
Haihai Wei (School of Computer and Artificial Intelligence, Zhengzhou University) ;
Li Ma (School of Information Engineering, Nanyang Institute of Technology) ;
Qingji Xue (School of Information Engineering, Nanyang Institute of Technology) ;
Yonghui Fu (School of Computer and Artificial Intelligence, Zhengzhou University)

Received : 2022.07.26
Accepted : 2022.11.14
Published : 2023.08.31

https://doi.org/10.3745/JIPS.02.0199 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Plenty of works have indicated that single image super-resolution (SISR) models relying on synthetic datasets are difficult to be applied to real scene text image super-resolution (STISR) for its more complex degradation. The up-to-date dataset for realistic STISR is called TextZoom, while the current methods trained on this dataset have not considered the effect of multi-scale features of text images. In this paper, a multi-scale and attention fusion model for realistic STISR is proposed. The multi-scale learning mechanism is introduced to acquire sophisticated feature representations of text images; The spatial and channel attentions are introduced to capture the local information and inter-channel interaction information of text images; At last, this paper designs a multi-scale residual attention module by skillfully fusing multi-scale learning and attention mechanisms. The experiments on TextZoom demonstrate that the model proposed increases scene text recognition's (ASTER) average recognition accuracy by 1.2% compared to text super-resolution network.

Keywords

Acknowledgement

This paper was sponsored by Natural Science Foundation of Henan (No. 222300420504) and Academic Degrees & Graduate Education Reform Project of Henan Province (No. 2021SJGLX262Y).

References

X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, "Text recognition in the wild: a survey," ACM Computing Surveys (CSUR), vol. 54, no. 2, article no. 42, 2021. https://doi.org/10.1145/3440756
W. Liu, C. Chen, K. Y. K. Wong, Z. Su, and J. Han, "Star-Net: a spatial attention residue network for scene text recognition," in Proceedings of the British Machine Vision Conference (BMVC), York, UK, 2016.
Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, "Focusing attention: towards accurate text recognition in natural images," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 5086-5094.
C. Luo, L. Jin, and Z. Sun, "Moran: a multi-object rectified attention network for scene text recognition," Pattern Recognition, vol. 90, pp. 109-118, 2019. https://doi.org/10.1016/j.patcog.2019.01.020
B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, "ASTER: an attentional scene text recognizer with flexible rectification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2035-2048, 2019. https://doi.org/10.1109/TPAMI.2018.2848939
E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, and G. Li, "Scene text detection with supervised pyramid context network," Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 1, pp. 9038-9045, 2019. https://doi.org/10.1609/aaai.v33i01.33019038
W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, "Shape robust text detection with progressive scale expansion network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019, pp. 9336-9345.
W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, "Efficient and accurate arbitraryshaped text detection with pixel aggregation network," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019, pp. 8439-8448.
J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, "What is wrong with scene text recognition model comparisons? dataset and model analysis," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019, pp. 4714-4722.
C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295-307, 2016. https://doi.org/10.1109/TPAMI.2015.2439281
J. Kim, J. K. Lee, and K. M. Lee, "Accurate image super-resolution using very deep convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 1646-1654.
B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, "Enhanced deep residual networks for single image superresolution," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, 2017, pp. 1132-1140.
B. Liu, "Lightweight single image super-resolution by channel split residual convolution," Journal of Information Processing Systems, vol. 18, no. 1, pp. 12-25, 2022. http://doi.org/10.3745/JIPS.02.0168
Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, "Residual dense network for image super-resolution," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 2472-2481.
W. Wang, E. Xie, X. Liu, W. Wang, D. Liang, C. Shen, and X. Bai, "Scene text image super-resolution in the wild," in Computer Vision-ECCV 2020. Cham, Switzerland: Springer, 2020, pp. 650-666. https://doi.org/10.1007/978-3-030-58607-2_38
C. Fang, Y. Zhu, L. Liao, and X. Ling, "TSRGAN: real-world text image super-resolution based on adversarial learning and triplet attention," Neurocomputing, vol. 455, pp. 88-96, 2021. https://doi.org/10.1016/j.neucom.2021.05.060
W. S. Lai, J. B. Huang, N. Ahuja, and M. H. Yang, "Deep Laplacian pyramid networks for fast and accurate super-resolution," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017, pp. 5835-5843.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 2818-2826.
J. Qin, Y. Huang, and W. Wen, "Multi-scale feature fusion residual network for single image superresolution," Neurocomputing, vol. 379, pp. 334-342, 2020. https://doi.org/10.1016/j.neucom.2019.10.076
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "CBAM: convolutional block attention module," in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 3-19).
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 770-778.
T. Tong, G. Li, X. Liu, and Q. Gao, "Image super-resolution using dense skip connections," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 4809-4817.
Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, "Image super-resolution using very deep residual channel attention networks," in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 294-310.
B. Shi, X. Bai, and C. Yao, "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 2017. https://doi.org/10.1109/TPAMI.2016.2646371
M. Jaderberg, K. Simonyan, and A. Zisserman, "Spatial transformer networks," Advances in Neural Information Processing Systems, vol. 28, pp. 2017-2025, 2015.
W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 1874-1883.
J. W. Liu, H. D. Zhao, X. L. Luo, ad J. Xu, "Research progress on batch normalization of deep learning and its related algorithms," Acta Automatica Sinica, vol. 46, no. 6, pp. 1090-1120, 2020.
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, et al., "Photo-realistic single image super-resolution using a generative adversarial network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017, pp. 105-114.

Journal of Information Processing Systems

Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)