DOI QR코드

DOI QR Code

Scene Text Recognition Performance Improvement through an Add-on of an OCR based Classifier

OCR 엔진 기반 분류기 애드온 결합을 통한 이미지 내부 텍스트 인식 성능 향상

  • Chae, Ho-Yeol (Dpet. of Computer and Communications Engineering, Kangwon National University) ;
  • Seok, Ho-Sik (Dept. of Computer Science and Engineering, Kangwon National University)
  • Received : 2020.11.26
  • Accepted : 2020.12.28
  • Published : 2020.12.31

Abstract

An autonomous agent for real world should be able to recognize text in scenes. With the advancement of deep learning, various DNN models have been utilized for transformation, feature extraction, and predictions. However, the existing state-of-the art STR (Scene Text Recognition) engines do not achieve the performance required for real world applications. In this paper, we introduce a performance-improvement method through an add-on composed of an OCR (Optical Character Recognition) engine and a classifier for STR engines. On instances from IC13 and IC15 datasets which a STR engine failed to recognize, our method recognizes 10.92% of unrecognized characters.

일상 환경에서 동작하는 자율 에이전트를 구현하기 위해서는 이미지나 객체에 존재하는 텍스트를 인식하는 기능이 필수적이다. 주어진 이미지에 입력 변환, 특성 인식, 워드 예측을 적용하여 인식된 텍스트에 존재하는 워드를 출력하는 과정에 다양한 딥러닝 모델이 활용되고 있으며, 딥뉴럴넷의 놀라운 객체 인식 능력으로 인식 성능이 매우 향상되었지만 실제 환경에 적용하기에는 아직 부족한 점이 많다. 본 논문에서는 인식 성능 향상을 위하여 텍스트 존재 영역 감지, 텍스트 인식, 워드 예측의 파이프라인에 OCR 엔진과 분류기로 구성된 애드온을 추가하여 기존 파이프라인이 인식하지 못한 텍스트의 인식을 시도하는 접근법을 제안한다. IC13, IC15의 데이터 셋에 제안 방법을 적용한 결과, 문자 단위에서 기존 파이프라인이 인식하는데 실패한 문자의 최대 10.92%를 인식함을 확인하였다.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2018R1D1A1B07047156).

References

  1. Y. Zhu, C. Yao, and X. Bai, "Scene Text Detection and Recognition: Recent Advances and Future Trends," Front Compu .Sci, vol.10, pp. 19-36, 2016. DOI: 10.1007%2Fs11704-015-4488-0 https://doi.org/10.1007%2Fs11704-015-4488-0
  2. Z. Raisi, M. A. Naiel, P. Fieguth, S. Wardell, and J. Zelek, "Text Detection and Recognition in the Wild: a Review," arXiv:2006.04305, 2020.
  3. X. Chen and A. L. Yuille, "Detecting and Reading Text in Natural Scenes," in Proc. CVPR 2004, 2004.
  4. J.-J. Lee, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch, "AdaBoost for Text Detection in Natural Scene," in Proc. ICDAR 2011, pp.429-434. 2011. DOI: 10.1109/ICDAR.2011.93
  5. J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis," in Proc. ICCV 2019, pp.4715-4723, 2019.
  6. S. M. Hanif and L. Prevost, "Text Detection and Localization in Complex Scene Images using Constrained AdaBoost Algorithm," in Proc. Int. Conf. on Doc. Anal. and Recognit. pp.1-5, 2009. DOI: 10.1109/ICDAR.2009.172
  7. K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," IEEE Trans Pattern Ana. Mach Intell,. vol.25, no.12, pp.1631-1639, 2003. DOI: 10.1109/TPAMI.2003.1251157
  8. C. Yi and Y. Tian, "Text String Detection From Natural Scenes by Structure-Based Partition and Grouping," IEEE Trans. Image Process, vol. 20, no.9, pp.2594-2605, 2011. DOI: 10.1109/TIP.2011.2126586
  9. B. Epshtein, E. Ofek, and Y. Wexler, "Detecting text in natural scenes with stroke width transform," In Proc. of CVPR, 2010. DOI: 10.1109/CVPR.2010.5540041
  10. L. Neumann and J. Matas, "A method for text localization and recognition in real-world images," in Proc. ACCV 2010, pp.779-783, 2010. DOI: 10.1007/978-3-642-19318-7_60
  11. J. Matas, C. M. Urban, and T. Pajdla, "Robust wide-baseline stereo from maximally stable extremal regions," Image Vis Comput, vol.22, pp.761-767, 2004. DOI: 10.1016/j.imavis.2004.02.006
  12. X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, "Robust Text Detection in Natural Scene Images," IEEE Trans Pattern Ana. Mach Intell, vol.36, no.5, pp.970-983, 2014. DOI: 10.1109/TPAMI.2013.182
  13. A. Tabassum and S. A. Dhondse, "Text Detection Using MSER and Stroke Width Transform," in Proc. CSNT 2015, 2015. DOI: 10.1109/CSNT.2015.154
  14. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, "Deep Structured Output Learning for Unconstrained Text Recognition," arXiv:1412.5903,2015.
  15. H. Hu, C. Zhang, Y. Luo, Y. Wang, J. Han, and E. Ding, "WordSup: Exploiting Word Annotations for Character based Text Detection," in Proc ICCV 2017, pp.4940-4949, 2017.
  16. M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, "Spatial Transformer Networks," in Proc. NIPS 2015, pp.2017-2025, 2015.
  17. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in Proc. ICLR, 2015.
  18. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. CVPR 2016, pp.770-778, 2016.
  19. C.-Y. Lee and S. Osindero, "Recursive Recurrent Nets with Attention Modeling for OCR in the Wild," in Proc CVPR 2016, pp.2231-2239, 2016.
  20. Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character Region Awareness for Text Detection," in Proc. CVPR 2019, 2019.
  21. Y. Liu, S. Zhang, L. Jin, L. Xie, Y. Wu, and Z. Wang, "Omnidirectional Scene Text Detection with Sequential-free Box Discretization," in Proc. IJCAI 2019, 2019.
  22. Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, "Focusing Attention: Towards Accurate Text Recognition in Natural Images," in Proc. ICCV 2017, pp.5076-5084, 2017.
  23. W. Hu, X. Cai, J. Hou, S. Yi, and Z. Lin, "GTC: Guided Training of CTC towards Efficient and Accurate Scene Text Recognition," in Proc. AAAI-20, pp.11005-11012, 2020.
  24. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition," in Workshop on Deep Learning NIPS, 2014.
  25. A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic Data for Text Localisation in Natural Images," in Proc. CVPR 2016, 2016.
  26. A. Mishra, K. Alahari, and C. V. Jawahar, "Scene Text Recognition using Higher Order Language Priors," in Proc. BMVC 2012, 2012.
  27. K. Wang, B. Babenko, and S. Belongie, "End-to-end Scene Text Recognition," in Proc. ICCV 2011, pp.1457-1464, 2011. DOI: 10.1109/ICCV.2011.6126402
  28. S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young, "ICDAR 2003 Robust Reading Competitions," In Proc. ICDAR 2003, pp.682-687, 2003.
  29. D. Karatzas, et al. "ICDAR 2013 Robust Reading Competition," in Proc. ICDAR 2013, pp.1484-1493, 2013.
  30. D. Karatzas et al., "ICDAR 2015 Competition on Robust Reading," in Proc. ICDAR 2015, 2015.
  31. C. Yi and Y. Tian, "Text String Detection from Natural Scenes by Structure-based Partition and Grouping," IEEE Trans Image Process, vol. 20. no.9, pp.2594-2605. 2011. DOI: 10.1109/TIP.2011.2126586
  32. Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. "EMNIST: an extension of MNIST to handwritten letters.," arXiv:1702.05373, 2017.
  33. T. Y. Zhang, and C. Y. Suen, "A Fast Parallel Algorithm for Thining Digital patterns," Commun ACM, vol.27, no.3, pp.236-239, 1984. DOI: 10.1145/357994.358023