DOI QR코드

DOI QR Code

Recent Trends in Deep Learning-Based Optical Character Recognition

딥러닝 기반 광학 문자 인식 기술 동향

  • Published : 2022.10.01

Abstract

Optical character recognition is a primary technology required in different fields, including digitizing archival documents, industrial automation, automatic driving, video analytics, medicine, and financial institution, among others. It was created in 1928 using pattern matching, but with the advent of artificial intelligence, it has since evolved into a high-performance character recognition technology. Recently, methods for detecting curved text and characters existing in a complicated background are being studied. Additionally, deep learning models are being developed in a way to recognize texts in various orientations and resolutions, perspective distortion, illumination reflection and partially occluded text, complex font characters, and special characters and artistic text among others. This report reviews the recent deep learning-based text detection and recognition methods and their various applications.

Keywords

Acknowledgement

본 연구는 문화체육관광부 및 한국콘텐츠진흥원의 2022년도 문화기술 연구개발 사업으로 수행되었음[과제명: 인공지능 기반 개방형 한문 고서 번역 및 해석 지원 기술 개발, 과제번호: R2021040267, 기여율: 100%].

References

  1. Z. Raisi et al., "Text detection and recognition in the wild: A review," arXiv preprint, CoRR, 2020, arXiv: 2006.04305.
  2. R. Rake, "Image recognition market," Allied Market Research, 2018.
  3. A . Bissacco et al., "PhotoOCR: Reading text in uncontrolled conditions," in Proc. IEEE Int. Conf. Comput. Vis., (Sydney, Australia), Dec. 2013, pp. 785-792.
  4. L. Neumann et al., "A method for text localization and recognition in real-world images," in Proc. Asian Conf. Comput. Vis., (Queenstown, New Zealand), Nov. 2010, pp. 770-783.
  5. Y. Zhu, C. Yao, and X. Bai, "Scene text detection and recognition: Recent advances and future trends," Front. Comput. Sci., vol. 10, no. 1, 2016.
  6. Q. Ye and D. Doermann, "Text detection and recognition in imagery: A survey," IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 7, 2015.
  7. M. Liao et al., "Real-time scene text detection with differentiable binarization and adaptive scale fusion," IEEE Trans. Pattern Anal. Mach. Intell., 2022, p. 1.
  8. S. Long et al., "Textsnake: A flexible representation for detecting text of arbitrary shapes," in Proc. Eur. Conf. Comput. Vis., (Munich, Germany), Sept. 2018, pp. 20-36.
  9. Y. Baek et al., "Character region awareness for text detection," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (Long Beach, CA, USA), June 2019, pp. 9365-9374.
  10. Y. Ye et al., "TextFuseNet: Scene text detection with richer fused features," in Proc. Int. Joint Conf. Artif. Intell. (IJCAI-20), (Yokohama, Japan), Jan. 2021, pp. 516-522, https://www.ijcai.org/proceedings/2020/0072.pdf
  11. N. Subramani et al., "A survey of deep learning approaches for ocr and document understanding," arXiv preprint, CoRR, 2020, arXiv: 2011.13534.
  12. F. Borisyuk, A. Gordo, and V. Sivakumar, "Rosetta: Large scale system for text detection and recognition in images," in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., (London, United Kingdom), July 2018, pp. 71-79.
  13. B. Shi, X. Bai, and C. Yao, "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, 2016, pp. 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371
  14. W. Liu et al., STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition, Proceedings of the British Machine Vision Conference (BMVC), BMVA Press, 2016, pp. 43.1.-43.13.
  15. C.-Y. Lee et al., "Recursive recurrent nets with attention modeling for OCR in the wild," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (Las Vegas, NV, USA), June 2016, pp. 2231-2239.
  16. B. Shi et al., "Aster: An attentional scene text recognizer with flexible rectification," IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 9, 2018.
  17. F. Zhan and S. Lu, "Esir: End-to-end scene text recognition via iterative image rectification," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (Long Beach, CA, USA), June 2019, pp. 2059-2068.
  18. Z. Cheng et al., "Aon: Towards arbitrarily-oriented text recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (Salt Lake City, UT, USA), June 2018, pp. 5571-5579.
  19. H. Li et al., "Show, attend and read: A simple and strong baseline for irregular text recognition," in Proc. AAAI Conf. Artif. Intel., vol. 33, no. 1, 2019, pp. 8610-8617.
  20. Q. Wang et al., "Faclstm: Convlstm with focused attention for scene text recognition," arXiv preprint, CoRR, 2019, arXiv: 1904.09405.
  21. S. Fang et al., "Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (Virtual), June 2021, pp. 7098-7107.
  22. Y. Wang et al., "From two to one: A new scene text recognizer with visual language modeling network," in Proc. IEEE Conf. Comput. Vis., (Virtual), Oct. 2021, pp. 14194-14203.
  23. T. He et al., "An end-to-end textspotter with explicit alignment and attention," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (Salt Lake City, UT, USA), June 2018, pp. 5020-5029.
  24. W. Feng et al., "Textdragon: An end-to-end framework for arbitrary shaped text spotting," in Proc. IEEE Conf. Comput. Vis., (Seoul, Rep. Korea), Oct. 2019, pp. 9076-9085.
  25. M. Liao et al., "Mask textspotter v3: Segmentation proposal network for robust scene text spotting," in Euro. Conf. Comput. Vis., (Glasgow, United Kingdom), Aug. 2020, pp. 706-722.
  26. https://cloud.google.com/vision/
  27. 황선명, 염희균, "윈도우 기반의 광학문자인식을 이용한 영상 번역 시스템 구현," 사물인터넷융복합논문지, 제5권 제2호, 2019, pp. 15-20.
  28. https://pdf.abbyy.com/
  29. A.P. Tafti et al., "OCR as a service: An experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym," in Proc. Int. Symp. Vis. Comput. (ISVC), (Las Vegas, NV, USA), Dec. 2016, pp. 735-746.
  30. https://www.camcard.com/
  31. https://clova.ai/ocr
  32. http://ocr.selvasai.com/
  33. https://www.synapsoft.co.kr/ocr/
  34. 민기현, 이아람, 강현서, "인공지능 기반 한문 고서의 한자 검출을 위한 전처리 알고리즘에 관한 연구," 한국통신학회 추계종합학술발표회, 2021, pp. 597-598.
  35. 류은주, 문미경, "광학문자인식(OCR)기반 시각장애인용 셀프 E-book," 한국컴퓨터종합학술대회, 2016, pp. 1801-1803.
  36. 김인택, 안대진, 이해영, "인공지능을 활용한 지능형 기록관리 방안," 한국기록관리학회지, 제17권 제4호, 2017, pp. 225-250. https://doi.org/10.14404/JKSARM.2017.17.4.225
  37. 임윤지 외, "OCR 기반의 자동 문자인식," 한국소프트웨어종합학술대회, 2019, pp. 1318-1320.
  38. 백종경 외, "전자문서에서 서식인식과 광학문자인식을 이용한 개인정보 탐지 및 보호 시스템," 한국산학기술학회논문지, 제21권 제5호, 2020, pp. 451-457. https://doi.org/10.5762/KAIS.2020.21.5.451
  39. 이승훈 외 , "OCR 기술을 이용한 한글 처방전 문자 인식 시스템," 한국정보과학회 학술발표논문집, 2017, pp. 362-364.
  40. 차영화 외, "객체 감지와 광학 문자 인식을 이용한 아날로그 전력 계량기 이미지에서의 숫자 영역 인식," 한국통신학회 학술대회논문집, 2020, pp. 334-335.
  41. 이교혁 외, "광심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템," 지능정보연구, 제26권 제2호, 2020, pp. 1-25. https://doi.org/10.13088/JIIS.2020.26.2.001
  42. 김재철 외 , "필기체 우편영상 주소인식을 위한 문자 추출 알고리즘," 한국정보과학회 학술발표논문집, 2017, pp. 1414-1416.
  43. 장일식 외, "지능형 감시 카메라 동향 및 시나리오 연구," 한국ITS학회논문지, 제8권 제4호, 2009, pp. 93-101.
  44. 김건우, "딥러닝 기반 열악 자동차 번호 이미지 복원 및 인식 기술," 주간기술동향, 2020, pp. 27-32.
  45. Y. Zhu et al., "Cascaded segmentation-detection networks for text-based traffic sign detection," IEEE Trans. Intell. Transp. Syst., vol. 19, no. 1, 2018, pp. 209-219. https://doi.org/10.1109/tits.2017.2768827
  46. R. Ravindran et al., "Traffic Sign Identification Using Deep Learning," in Proc. Int. Conf. Comput. Sci. Comput. Intell., (Las Vegas, NV, USA), Dec. 2019, pp. 318-323.
  47. 노대경, "광학문자인식," ASTI Market Insight 2021-024, 2021.