DOI QR코드

DOI QR Code

Automatic gasometer reading system using selective optical character recognition

관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템

  • Lee, Kyohyuk (Management of Technology, Yonsei University) ;
  • Kim, Taeyeon (Computer Science, KAIST) ;
  • Kim, Wooju (Graduate School of Information and Industrial Engineering, Yonsei University)
  • 이교혁 (연세대학교 기술경영학협동과정) ;
  • 김태연 (한국과학기술원 컴퓨터과학과) ;
  • 김우주 (연세대학교 정보산업공학과)
  • Received : 2020.04.11
  • Accepted : 2020.05.14
  • Published : 2020.06.30

Abstract

In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

본 연구에서는 모바일 기기를 이용하여 획득한 가스계량기 사진을 서버로 전송하고, 이를 분석하여 가스 사용량 및 계량기 기물 번호를 인식함으로써 가스 사용량에 대한 과금을 자동으로 처리할 수 있는 응용 시스템 구조를 제안하고자 한다. 모바일 기기는 일반인들이 사용하는 스마트 폰에 준하는 기기를 사용하였으며, 획득한 이미지는 가스 공급사의 사설 LTE 망을 통해 서버로 전송된다. 서버에서는 전송받은 이미지를 분석하여 가스계량기 기물 번호 및 가스 사용량 정보를 추출하고, 사설 LTE 망을 통해 분석 결과를 모바일 기기로 회신한다. 일반적으로 이미지 내에는 많은 종류의 문자 정보가 포함되어 있으나, 본 연구의 응용분야인 가스계량기 자동 검침과 같이 많은 종류의 문자 정보 중 특정 형태의 문자 정보만이 유용한 분야가 존재한다. 본 연구의 응용분야 적용을 위해서는 가스계량기 사진 내의 많은 문자 정보 중에서 관심 대상인 기물 번호 및 가스 사용량 정보만을 선별적으로 검출하고 인식하는 관심 문자열 인식 기술이 필요하다. 관심 문자열 인식을 위해 CNN (Convolutional Neural Network) 심층 신경망 기반의 객체 검출 기술을 적용하여 이미지 내에서 가스 사용량 및 계량기 기물번호의 영역 정보를 추출하고, 추출된 문자열 영역 각각에 CRNN (Convolutional Recurrent Neural Network) 심층 신경망 기술을 적용하여 문자열 전체를 한 번에 인식하였다. 본 연구에서 제안하는 관심문자열 기술 구조는 총 3개의 심층 신경망으로 구성되어 있다. 첫 번째는 관심 문자열 영역을 검출하는 합성곱신경망이고, 두 번째는 관심 문자열 영역 내의 문자열 인식을 위해 영역 내의 이미지를 세로 열 별로 특징 추출하는 합성곱 신경망이며, 마지막 세 번째는 세로 열 별로 추출된 특징 벡터 나열을 문자열로 변환하는 시계열 분석 신경망이다. 관심 문자열은 12자리 기물번호 및 4 ~ 5 자리 사용량이며, 인식 정확도는 각각 0.960, 0.864 이다. 전체 시스템은 Amazon Web Service 에서 제공하는 클라우드 환경에서 구현하였으며 인텔 제온 E5-2686 v4 CPU 및 Nvidia TESLA V100 GPU를 사용하였다. 1일 70만 건의 검침 요청을 고속 병렬 처리하기 위해 마스터-슬레이브 처리 구조를 채용하였다. 마스터 프로세스는 CPU 에서 구동되며, 모바일 기기로 부터의 검침 요청을 입력 큐에 저장한다. 슬레이브 프로세스는 문자열 인식을 수행하는 심층 신경망으로써, GPU에서 구동된다. 슬레이브 프로세스는 입력 큐에 저장된 이미지를 기물번호 문자열, 기물번호 위치, 사용량 문자열, 사용량 위치 등으로 변환하여 출력 큐에 저장한다. 마스터 프로세스는 출력 큐에 저장된 검침 정보를 모바일 기기로 전달한다.

Keywords

References

  1. Ahn, H., K.-j. Kim, and I. Han, "Purchase Prediction Model using the Support Vector Machine," Journal of Intelligence and Information Systems, Vol.11, No.3(2005), 69-81.
  2. Baek, Y., Lee, B., Han,D., Yun, S., & Lee, H. (2019). Character Region Awareness for Text Detection. arXiv preprintarXiv:1904.01941.
  3. Ballard, D. H. (1981).Generalizing the Hough transform to detect arbitrary shapes. Pattern recognition, 13(2), 111-122. https://doi.org/10.1016/0031-3203(81)90009-1
  4. Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprintarXiv:1605.07678.
  5. Cao, Z., Simon, T., Wei,S.-E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  6. Dalal, N., & Triggs,B. (2005). Histograms of oriented gradients for human detection. Paper presented at the international Conference on computer vision & Pattern Recognition (CVPR'05).
  7. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International journal of computer vision, 59(2), 167-181. https://doi.org/10.1023/B:VISI.0000022288.19776.77
  8. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202. https://doi.org/10.1007/BF00344251
  9. Girshick, R., Donahue, J.,Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. n Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 580-587).
  10. Girshick, R., Donahue, J.,Darrell, T., & Malik, J. (2015). Fast r-cnn. n Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
  11. Glorot, X., & Bengio,Y. (2010). Understanding the difficultyof training deep feedforward neural networks. Paper presented at the Proceedings of the thirteenth international conference on artificial intelligence and statistics.
  12. Gower, J. C., & Ross,G. J. (1969). Minimum spanning trees and single linkage cluster analysis. Applied statistics, 54-64.
  13. He, K., Gkioxari, G.,Dollar, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision(pp. 2961-2969).
  14. He, K., Zhang, X., Ren,S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824
  15. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
  16. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  17. Huang, G., Liu, Z., VanDer Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  18. Huang, Z., Xu, W., &Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.
  19. Iandola, F., Moskewicz,M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. (2014). Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869.
  20. Ioffe, S., & Szegedy,C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning.
  21. Jaderberg, M., et al.(2016). Reading Text in the Wild with Convolutional Neural Networks. International Journal of Computer Vision, vol. 116, no. 1, 2016, pp. 1-20. https://doi.org/10.1007/s11263-015-0823-z
  22. Krizhevsky, A., Sutskever,I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
  23. Larsson, G., Maire, M., & Shakhnarovich, G. (2016). Fractalnet: Ultra-deep neural networks with outresiduals. arXiv preprintarXiv:1605.07648.
  24. LeCun, Y., Bottou, L.,Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
  25. Liao, M., Shi, B., &Bai, X. (2018). Textboxes++:A single-shot oriented scene text detector. IEEE transactions on image processing, 27(8), 3676-3690. https://doi.org/10.1109/TIP.2018.2825107
  26. Liao, M., Shi, B., Bai, X., Wang, X., & Liu, W. (2017). Textboxes:A fast text detector with a single deep neural network. Paper presented at the Thirty-First AAAI Conference on Artificial Intelligence.
  27. Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXivpreprint arXiv:1312.4400.
  28. Lindeberg, T. (2012). Scale invariant feature transform.
  29. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Paper presented at the European conference on computer vision.
  30. Liu, W., Chen, C., Wong, K.-Y. K., Su, Z., & Han, J. (2016). STAR-Net:A SpaTial Attention Residue Network for Scene Text Recognition. Paper presented at the BMVC.
  31. Long, J., Shelhamer, E.,& Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
  32. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  33. Nair, V., & Hinton, G.E. (2010). Rectified linear units improverestricted boltzmann machines. Paper presented at the Proceedings of the27th international conference on machine learning (ICML-10).
  34. Papandreou, G., Zhu, T.,Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., & Murphy, K. (2017). Towards accurate multi-person poseestimation in the wild. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  35. Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXivpreprint arXiv:1606.02147.
  36. Redmon, J., & Farhadi,A. (2017). YOLO9000:better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
  37. Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
  38. Redmon, J., Divvala, S.,Girshick, R., & Farhadi, A. (2016). Youonly look once: Unified, real-time object detection. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  39. Ren, S., He, K., Girshick,R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances inneural information processing systems (pp. 91-99).
  40. Rosenblatt, F. (1958). Theperceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386. https://doi.org/10.1037/h0042519
  41. Rumelhart, D. E., Hinton,G. E., & Williams, R. J. (1986). Learning representations by back propagating errors. nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0
  42. Shi, B., Bai, X., &Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11), 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371
  43. Shi, B., et al. (2017).Detecting Oriented Text in Natural Images by Linking Segments. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3482-3490.
  44. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  45. Szegedy, C., Ioffe, S.,Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet andthe impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence.
  46. Szegedy, C., Liu, W., Jia,Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 1-9).
  47. Tian, S., Lu, S., &Li, C. (2017). Wetext: Scene text detection under weak supervision. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision.
  48. Toshev, A., & Szegedy, C. (2014). Deeppose:Human poseestimation via deep neural networks. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  49. Uijlings, J. R., Van DeSande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104(2), 154-171. https://doi.org/10.1007/s11263-013-0620-5
  50. Wei, S.-E., Ramakrishna,V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  51. Zeiler, M. D., &Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. European Conference on Computer Vision, 818-833.
  52. Zhou, X., Yao, C., Wen,H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017). EAST: an efficient and accurate scene text detector. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
  53. Zhu, X., etal. (2017). Deep Residual Text Detection Network for Scene Text. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, pp. 807-812.